[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106446126B - Mass spatial information data storage management method and storage management system - Google Patents

Mass spatial information data storage management method and storage management system Download PDF

Info

Publication number
CN106446126B
CN106446126B CN201610832422.4A CN201610832422A CN106446126B CN 106446126 B CN106446126 B CN 106446126B CN 201610832422 A CN201610832422 A CN 201610832422A CN 106446126 B CN106446126 B CN 106446126B
Authority
CN
China
Prior art keywords
file
data
storage
distributed
storage management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610832422.4A
Other languages
Chinese (zh)
Other versions
CN106446126A (en
Inventor
王景光
邹同元
褚鹏飞
沈洋
侯伟
刘宗玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Space Star Data System Technology Co ltd
Original Assignee
Harbin Space Star Data System Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Space Star Data System Technology Co ltd filed Critical Harbin Space Star Data System Technology Co ltd
Priority to CN201610832422.4A priority Critical patent/CN106446126B/en
Publication of CN106446126A publication Critical patent/CN106446126A/en
Application granted granted Critical
Publication of CN106446126B publication Critical patent/CN106446126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a storage management method for massive spatial information data, which comprises the following steps: establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data; creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded; the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm. The invention solves the problem that the existing storage management method can not effectively store and manage the spatial information data with the characteristics of mass, multi-source, isomerism and the like.

Description

Mass spatial information data storage management method and storage management system
Technical Field
The invention relates to the field of research on storage relations of mass spatial data, in particular to a mass spatial information data storage management method and a storage management system.
Background
At present, the development of the satellite technology field in China is well-established, but some regions lack integration in the aspect of satellite application data information, and are seriously insufficient in overall planning and information fusion, wherein for example, the storage and management of spatial information data have many problems, the spatial information data tend to be diversified (space-based, aviation, adjacent space and the like) and have the characteristic of huge data quantity, the data multiple sources cause huge total data quantity, and tens of millions of file data need to be stored; the high resolution characteristic enables a single high-resolution spatial information data file (single data) to reach several GB or even dozens of GB, and it is a necessary trend to build a storage management system which meets the capacity of civil spatial information data with huge total amount and single data capacity.
Disclosure of Invention
The technical problem is that the existing storage management method can not effectively store and manage the spatial information data with the characteristics of mass, multi-source, isomerism and the like;
in view of this, embodiments of the present invention provide a method and a system for storing and managing mass spatial information data, so as to solve the technical problem.
The solution of the problem is as follows:
the invention provides a storage management method for massive spatial information data, which comprises the following steps:
establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;
creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm.
Further, the storage management node stores file system metadata of the distributed file system, and records the file system metadata using the editing log file when detecting any operation for modifying the file system metadata;
before the storage management node stores actual file data in the data storage node, the whole file is subjected to data segmentation, the whole file is segmented into file blocks with predefined sizes, and a globally unique handle is allocated to each file block. For example, a file is created in a relational database management system (RDSMS), and a storage management node records log information created by the file in an edit log file (Editlog); the storage management node stores the edit log file (edit log) file in a local file system; file data is pre-cut into file blocks with predefined sizes, and then distributed storage is performed by the file blocks, so that the storage efficiency can be improved; the existence of the file system metadata can enable organization and management of the distributed file system to be relied upon and recorded.
Furthermore, before actual file data is stored in the data storage node, the whole file is subjected to data segmentation to be segmented into file blocks with predefined sizes, and an operation of allocating a globally unique handle to each file block is completed by an external client through the storage management node;
the block size of the file block is defaulted to 64 MB. In the scheme, the default block size reaches 64MB, which is far larger than the block size of a common file system, and the advantage of selecting a larger block size is that the interaction between the client and the storage management node can be reduced; moreover, the client is likely to be aware of multiple operations on a given block, and maintaining a longer connection with a data storage node may reduce network load; meanwhile, the scale of the file system metadata stored on the storage management node can be reduced, so that the file system metadata can be stored in a memory, and the management and storage efficiency can be improved.
Further, all file blocks of the file data are copied into a plurality of copies, the number of the copies is called a copy factor, and the file block size and the copy factor of each file data are configurable. The copy factor can be configured when the file is created and can be changed later; the fault tolerance of the system can be improved by setting the copy factors.
Further, on the basis of a Hash modular algorithm, a redundancy strategy is adopted, a redundancy fragmentation space is added to the data storage nodes, and each file block is copied into a copy;
and if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy. The storage management node is responsible for managing the copying of the file blocks, and when a part of storage nodes lose contact with the storage management node, the storage management node can mark the storage nodes which cannot be connected as failure states; at this time, the storage management node continuously tracks the file block copy to be copied and starts the copying of the file block copy; when the copy factor of a certain file is reduced, the storage management node selects the surplus file block copy to be deleted, and transmits a copy deletion instruction to a certain storage node in the next heartbeat detection, and at the moment, the storage node removes the corresponding file block copy and releases the space; the tile data security of the system and the reliability of the system are effectively guaranteed by the arrangement of the scheme.
Further, the namespace, the mapping of the block to the file, and the attribute of the file of the distributed file system are all stored in the file system metadata, and the file system metadata is stored in the storage management node. A block is the smallest unit of storage and processing in a database, containing header information data or PL/SQL code for the block itself.
Further, the distributed file system is implemented based on a relational database management system (RDSMS);
when the data storage node is damaged, a file content checksum mechanism is adopted;
when a new relational database management system (RDSMS) file is created, a checksum is calculated for each file block of the file data and the checksum is saved as a separate hidden file to the storage management node. Massive remote sensing data are stored in an independent file in a framing mode, and a relational database management system is responsible for indexing work of file data, file blocks and the like, so that an application system can quickly position the file data to be read; the mode is very convenient for managing mass data and can be used according to
The capacity expansion is realized by continuously increasing storage equipment or storage nodes according to the data capacity requirement. (ii) a The checksum is actually stored in the namespace of the storage management node; the scheme aims at the condition that a file block of a certain storage node is possibly damaged due to storage equipment errors, network errors or software defects of the storage node and the like; in this case, RDSMS adopts a file content checksum mechanism to ensure data integrity.
Furthermore, the distributed file system adopts an information separation mixed indexing algorithm for indexing relevant spatio-temporal information, and an improved hash table is adopted in the algorithm for indexing static information, dynamic historical spatio-temporal information and current and future spatio-temporal information. The scheme solves the problems of cooperativity and survivability of massive spatial information.
Furthermore, the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and transversely takes high-score application as a basic unit, so that data organization of different spatial geographic resource types is realized; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized. Transversely taking high-resolution application as a basic unit, and further realizing data organization of different space geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized.
The invention also provides a mass space information data storage management system adopting any one of the systems disclosed by the invention, which comprises the following components:
the distributed file system is established in a master-slave mode;
the storage management node is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;
the storage cluster is used for storing file data in a distributed manner by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
an external client coupled to the storage management node.
The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
FIG. 1 is a flow chart of a mass spatial information data storage management method according to the present invention;
FIG. 2 is a schematic diagram of the mass spatial information data storage management system of the present invention;
FIG. 3 is a diagram illustrating the software architecture of a mass spatial information data storage management system according to the present invention;
FIG. 4 is a schematic diagram of a network deployment of the storage management system of the present invention.
Throughout the drawings, it should be noted that like reference numerals are used to depict the same or similar elements, features and structures.
Detailed Description
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. The following description includes various specific details to aid understanding, but these details are to be regarded as illustrative only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
The terms and words used in the following description and claims are not limited to literature meanings, but are used only by the inventor to enable the disclosure to be clearly and consistently understood. Accordingly, it should be apparent to those skilled in the art that the following descriptions of the various embodiments of the present disclosure are provided for illustration only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.
It is to be understood that the singular forms also include the plural reference unless the context clearly dictates otherwise. Thus, for example, reference to a "component surface" includes reference to one or more such surfaces.
The first embodiment is as follows:
fig. 1 is a flowchart of a method for storing and managing mass spatial information data according to the present invention, and referring to fig. 1, the present invention discloses a method for storing and managing mass spatial information data, which includes the steps of:
s1: establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;
s2, creating a distributed storage cluster which is controlled by the instruction of the storage management node and takes the file block as a basic unit to store the file data in a distributed way and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
s3: the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm.
The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
The Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
The Hash modular arithmetic adopts a Hash function based on modular arithmetic to carry out modular arithmetic on the key, and the arithmetic result is taken as the entry address of the Hash table corresponding to the key. If N nodes are needed, an object key needs to be uniformly mapped to the N nodes, and the node to which the object key is stored is determined according to the result of Hash (key)% N.
The core idea of the distribution strategy of the invention is to use the Hash modular algorithm for reference: the formula is as follows:
H(key)=(3*row+col)%K
h (key) is the number of the fragment space, K is the maximum number of virtual nodes, K > H (key) > 0, row is the row number of the lower left corner of the tile, and col is the column number of the lower left corner. The fragmentation space number is in the top directory of the tile file tree structure.
Any 3 x 3 tile request thus obtained, where the 9 tile space numbers are all different. This is reasonable according to the existing tile size 512 by 512 pixels and the mainstream display window size 1280 by 1024 pixels.
For mapping of the fragmentation space number and the actual physical node, for example, the number of the actual physical node is n (n < ═ K), the number of the fragmentation space numbers allocated to each node should be between (INT) K/n and (INT) K/(n +1), and the difference between adjacent fragmentation space numbers of each node is n. In the present invention, the maximum value (K) of the number of storage nodes is 255 (this value is estimated according to the actual storage amount, and once the setting cannot be changed), and if n is 9, the fragmentation space number information is as shown in table 1: a space slicing example is shown.
Table 1: space slicing example
Figure BDA0001116675930000071
Figure BDA0001116675930000081
And K is 255, and the information of the slicing space number stored by each tile is detailed in the table 2.
Table 2: tile space slicing
50 51 52 53 54
0 1 53 54
0 1 2 3 4
0 1 2 3 4 5 6 7
2 3 4 5 6 7 8 9 0 0 1
5 6 7 8 9 0 1 2 3 0 1 2 3 4
8 9 0 1 2 3 4 5 6 3 4 5 6 7
1 2 3 4 5 6 7 8 9 6 7 8 9 0
4 5 6 7 8 9 0 1 2 9 0 1 2 3
42 43 44 45 46 47 48 49 50 37 38 39 40 41
45 46 47 48 49 50 51 52 53 40 41 42 43 44
48 49 50 51 52 53 54 43 44 45 46 47
51 52 53 54 46 47 48 49 50
54 49 50 51 52 53
The whole table can be regarded as original remote sensing image data, each cell in the table represents a block of tile image data, and the tile image data can be understood as a large amount of tile data generated after the original image data is subjected to segmentation processing. From the table, it can be found that for any 3 × 3 request, the 9 tile data involved therein are stored on different tile spaces and the tile spaces are numbered consecutively, so that concurrent requests are maximally uniform to each node when requested, and the concurrency degree is the highest and the efficiency is the fastest. Even for n × n requests, the distributed nodes of each tile are relatively dispersed and not concentrated in some nodes, so that the efficiency is guaranteed to some extent. From each row, each row includes all consecutive tile space numbers (0 to 254), and the number of tiles stored in each tile space is equal, so the number of tiles actually stored in each physical node is almost equal. Consequently whole remote sensing image data is through handling the tile data that generates according to the utility model discloses a hash distribution strategy will be by even dispersion to different storage nodes, and every storage node's storage tile data volume also approximately equals, has reached the requirement of query efficiency and load balancing.
Preferably, in this embodiment, the storage management node stores the file system metadata of the distributed file system, and records the file system metadata using the edit log file when detecting any operation that modifies the file system metadata;
before the storage management node stores actual file data in the data storage node, the whole file is subjected to data segmentation, the whole file is segmented into file blocks with predefined sizes, and a globally unique handle is allocated to each file block. The spatial information data has the characteristics of once writing and multiple reading, and once spatial information data products of different levels are processed and generated, the data per se is not allowed to be modified in principle; the file system metadata is helpful for this, for example, a file is created in a relational database management system (RDSMS), and the storage management node records log information of file creation in an edit log file (Editlog); the storage management node stores the edit log file (edit log) file in a local file system; file data is pre-cut into file blocks with predefined sizes, and then distributed storage is performed by the file blocks, so that the storage efficiency can be improved; the existence of the file system metadata can enable organization and management of the distributed file system to be relied upon and recorded.
Preferably, in this embodiment, before actual file data is stored in the data storage node, the whole file is subjected to data segmentation to be segmented into file blocks of a predefined size, and an operation of allocating a globally unique handle to each file block is completed by the external client through the storage management node;
the block size of a file block is defaulted to 64 MB. In the scheme, the default block size reaches 64MB, which is far larger than the block size of a common file system, and the advantage of selecting a larger block size is that the interaction between the client and the storage management node can be reduced; moreover, the client is likely to be aware of multiple operations on a given block, and maintaining a longer connection with a data storage node may reduce network load; meanwhile, the scale of the file system metadata stored on the storage management node can be reduced, so that the file system metadata can be stored in a memory, and the management and storage efficiency can be improved.
Preferably, in this embodiment, all file blocks of the file data are copied into multiple copies, the number of copies is referred to as a copy factor, and the file block size and the copy factor of each file data are configurable. The copy factor can be configured when the file is created and can be changed later; the fault tolerance of the system can be improved by setting the copy factors.
Preferably, in this embodiment, on the basis of the hash modulo algorithm, a redundancy policy is adopted, and by adding a redundancy fragmentation space to the data storage node, each file block is copied into a copy;
and if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy. When the local storage device or the network device fails, the system has the capability of automatic error detection and rapid and automatic data recovery; in the scheme, the storage management node is responsible for managing the copying of the file blocks, and when a part of storage nodes lose contact with the storage management node, the storage management node can mark the storage nodes which cannot be connected as failure states; at this time, the storage management node continuously tracks the file block copy to be copied and starts the copying of the file block copy; when the copy factor of a certain file is reduced, the storage management node selects the surplus file block copy to be deleted, and transmits a copy deletion instruction to a certain storage node in the next heartbeat detection, and at the moment, the storage node removes the corresponding file block copy and releases the space; the tile data security of the system and the reliability of the system are effectively guaranteed by the arrangement of the scheme.
For the redundancy strategy mentioned therein:
the distributed strategy based on the Hash modular method can well guarantee the load balance of the system, but the design of the safety and the reliability of the system is lacked, namely once a certain storage node fails, the tile data of the corresponding storage node is lost, and huge loss is brought to the system. Most distributed storage systems tend to adopt a fault-tolerant method to increase the reliability of the system. So-called fault tolerance is to say fault tolerance, which allows a system to fail and requires that related functions and services do not fail when a fault occurs, and the fault tolerance function can be designed usually in a copy form, and the essence of the method is a redundancy strategy.
The basic idea of the redundancy strategy is to design a distributed redundancy deployment scheme of 'fragmentation space copy' on the basis of a Hash modular algorithm, namely, the redundancy fragmentation space is added to a physical node to add a copy to the system, so that the fault tolerance of the system is effectively improved, and the reliability and the safety of the distributed system are ensured. The specific method comprises the following steps: and adding two copies to the fragmentation space number obtained by the Hash modulo, and respectively placing the two copies into the previous node and the next node. Thus, each tile data will store three thirds of data on three nodes, with two copies, and if the primary physical node corresponding to the space slice fails, the data can be obtained from the other two standby physical nodes, and the space slice number of each node also has the original one list to be changed into three lists, as described in table 3.
Table 3: slicing space number table corresponding to ith physical storage node
Copies of i-1 nodes i node Copies of the i +1 node
(i-1)-1 i-1 (i+1)-1
(i-1)-1+n i-1+n (i+1)-1+n
(i-1)-1+2n i-1+2n (i+1)-1+2n
(i-1)-1+3n i-1+3n (i+1)-1+3n
(i-1)-1+kn i-1+kn (i+1)-1+kn
The multi-resolution tile data structure is a common strategy for storage and management of mass remote sensing image data at present, the strategy fully utilizes a multi-resolution image pyramid and an image blocking technology, can be effectively applied to seamless organization and visualization of the mass remote sensing data, and solves the real expression and presentation of the real world based on images; the high-resolution satellite remote sensing technology with the characteristics of high spatial resolution, high temporal resolution or high spectral resolution provides abundant data sources for remote sensing quantification, dynamism, networking, practicability and industrialization and extraction of ground feature characteristics by using remote sensing data.
Preferably, in this embodiment, the namespace, the block-to-file mapping, and the attributes of the file of the distributed file system are all stored in the file system metadata, and the file system metadata is stored in the storage management node. A block is the smallest unit of storage and processing in a database, containing header information data or PL/SQL code for the block itself.
Preferably, the distributed file system is implemented based on a relational database management system (RDSMS);
when the data storage node is damaged, a file content checksum mechanism is adopted;
when a new relational database management system (RDSMS) file is created, a checksum is calculated for each file block of the file data and the checksum is saved as a separate hidden file to the storage management node. Massive remote sensing data are stored in an independent file in a framing mode, and a relational database management system is responsible for indexing work of file data, file blocks and the like, so that an application system can quickly position the file data to be read; the mode is very convenient for managing mass data, and capacity expansion can be realized by continuously increasing storage equipment or storage nodes according to the data capacity requirement. (ii) a The checksum is actually stored in the namespace of the storage management node; the scheme aims at the condition that a file block of a certain storage node is possibly damaged due to storage equipment errors, network errors or software defects of the storage node and the like; in this case, RDSMS adopts a file content checksum mechanism to ensure data integrity.
Preferably, the distributed file system employs an information separation hybrid indexing algorithm for indexing relevant spatiotemporal information, which employs an improved hash table for indexing static information, dynamic historical spatiotemporal information, current and future spatiotemporal information. The scheme solves the problems of cooperativity and survivability of massive spatial information.
Preferably, in the embodiment, the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and transversely takes high-score application as a basic unit, so as to realize data organization of different spatial geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized. Transversely taking high-resolution application as a basic unit, and further realizing data organization of different space geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized.
Example two:
fig. 2 is a schematic diagram of a mass spatial information data storage management system according to the present invention, and as can be seen from fig. 2, the present invention further provides a mass spatial information data storage management system according to any one of the disclosure of the present invention, including:
the distributed file system 1 is established in a master-slave mode;
the storage management node 10 is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;
the storage cluster 20 is used for distributed storage of file data by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
an external client coupled to the storage management node.
The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
Fig. 3 is a software architecture diagram of a mass spatial information data storage management system according to the present invention, and it can be known with reference to fig. 2 and 3 that:
the system also comprises a storage virtualization layer arranged between the storage management node and the storage cluster, wherein the storage virtualization layer stores resource metadata and interface protocols and specifications, the resource metadata comprise heartbeat information of the storage node and the like, and the interface protocols and specifications store a heartbeat information reporting interface of the storage management node and the like, which are shown in detail in figure 3; in addition, as can be known from fig. 3, the data storage node may be a linux server, a uinx server, a windows server, and the like, and besides, fig. 3 also shows some contents of the system, and since the contents are not the main points of the present invention, detailed description is not repeated.
Fig. 4 is a schematic diagram of network deployment of the storage management system of the present invention, and it can be known with reference to fig. 4 that:
a) the client computer is the medium of the massive spatial information data storage and management system. The client computer interacts with a background storage server in a B/S mode, and data is transmitted between the storage server and the client computer; the right unit can log in the own account number on the server through the computer and enjoy the due right of the own.
b) The special line access router is responsible for the special line to access the spatial information data center and provide a data communication link. The private access router is connected with client computers of all ownership units downwards and is connected with a core switch of the spatial information data center upwards.
c) The core switch is a data communication core of the spatial information data center and provides high-speed packet switching capacity for communication between the storage server and the management server. The core switch is connected with the access router of the spatial information data center downwards and is connected with the access switch upwards.
d) The access switch provides sufficient interfaces to provide access capabilities for the storage server and the management server. The access switch is connected with the storage server and the management server upwards and is connected with the core switch downwards.
e) The storage server is responsible for operating the server of the right unit, and the right unit logs in the space information data storage and management system to obtain service by using the account of the right unit through the storage server. The storage server is connected down to the access switch.
f) The management server organizes and manages the storage server, and the management server is connected with the access switch.
It should be noted that the various embodiments of the present disclosure as described above generally relate to the processing of input data and the generation of output data to some extent. This input data processing and output data generation may be implemented in hardware or software in combination with hardware. For example, certain electronic components may be employed in a mobile device or similar or related circuitry for implementing the functions associated with the various embodiments of the present disclosure as described above. Alternatively, one or more processors operating in accordance with stored instructions may implement the functions associated with the various embodiments of the present disclosure as described above. If so, it is within the scope of the present disclosure that these instructions may be stored on one or more non-transitory processor-readable media. Examples of the processor-readable medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. In addition, functional computer programs, instructions, and instruction segments for implementing the present disclosure can be easily construed by programmers skilled in the art to which the present disclosure pertains.
While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.

Claims (7)

1. A mass space information data storage management method comprises the following steps:
establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;
creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm;
wherein,
before actual file data is stored in a data storage node through the storage management node, an external client performs data segmentation on the whole file, segments the file into file blocks with predefined sizes, and allocates a globally unique handle to each file block; wherein the block size of the file block is defaulted to 64 MB;
all file blocks of the file data are copied into a plurality of copies;
on the basis of a Hash modular algorithm, a redundancy strategy is adopted, a redundancy fragmentation space is added to a data storage node, and each file block is copied into a copy; if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy;
the distributed file system adopts an information separation mixed indexing algorithm for indexing related spatio-temporal information, and an improved hash table is adopted in the algorithm for indexing static information, dynamic historical spatio-temporal information and current and future spatio-temporal information.
2. The method for storage management of mass spatial information data as claimed in claim 1, wherein said storage management node stores file system metadata of the distributed file system and records using the edit log file when detecting any operation for modifying the file system metadata.
3. The method for storage management of mass spatial information data as claimed in claim 2, wherein the number of said copies is called copy factor, and the file block size and copy factor of each file data are configurable.
4. The method for storage and management of mass spatial information data according to claim 2, wherein the namespace, block-to-file mapping, and file attributes of the distributed file system are stored in the file system metadata, and the file system metadata is stored in the storage management node.
5. The method for storage and management of mass spatial information data as claimed in claim 1, wherein said distributed file system is implemented based on a relational database management system;
when the data storage node is damaged, a file content checksum mechanism is adopted;
when a new relational database management system file is created, a checksum is calculated for each file block of the file data and stored as a separate hidden file in the storage management node.
6. The method for storing and managing mass spatial information data according to claim 1, wherein the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and uses high-score application as a basic unit in the transverse direction; the vertical direction is managed with multiple levels of resolution.
7. A mass spatial information data storage management system using the mass spatial information data storage management method according to any one of claims 1 to 6, comprising:
the distributed file system is established in a master-slave mode;
the storage management node is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;
the storage cluster is used for storing file data in a distributed manner by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
an external client coupled to the storage management node.
CN201610832422.4A 2016-09-19 2016-09-19 Mass spatial information data storage management method and storage management system Active CN106446126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610832422.4A CN106446126B (en) 2016-09-19 2016-09-19 Mass spatial information data storage management method and storage management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610832422.4A CN106446126B (en) 2016-09-19 2016-09-19 Mass spatial information data storage management method and storage management system

Publications (2)

Publication Number Publication Date
CN106446126A CN106446126A (en) 2017-02-22
CN106446126B true CN106446126B (en) 2021-04-20

Family

ID=58165641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610832422.4A Active CN106446126B (en) 2016-09-19 2016-09-19 Mass spatial information data storage management method and storage management system

Country Status (1)

Country Link
CN (1) CN106446126B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301094A (en) * 2017-05-10 2017-10-27 南开大学 The dynamic self-adapting data model inquired about towards extensive dynamic transaction
CN107423422B (en) * 2017-08-01 2019-09-24 武大吉奥信息技术有限公司 Spatial data distributed storage and search method and system based on grid
CN107656980B (en) * 2017-09-07 2020-09-22 北京神州绿盟信息安全科技股份有限公司 Method applied to distributed database system and distributed database system
CN107766645B (en) * 2017-10-19 2020-10-16 上海营邑城市规划设计股份有限公司 Planning BIM pipeline collision detection system storage management device
CN107943867B (en) * 2017-11-10 2021-11-23 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN109144966A (en) * 2018-07-06 2019-01-04 航天星图科技(北京)有限公司 A kind of high-efficiency tissue and management method of massive spatio-temporal data
CN111176584B (en) * 2019-12-31 2023-10-31 曙光信息产业(北京)有限公司 Data processing method and device based on hybrid memory
CN111414346A (en) * 2020-04-30 2020-07-14 武汉众邦银行股份有限公司 Distributed granulation storage method for massive unstructured data files
CN111930711B (en) * 2020-09-10 2020-12-29 北京志翔科技股份有限公司 Method, device and equipment for adding nodes to distributed file system cluster
CN112100146B (en) * 2020-09-21 2021-06-29 重庆紫光华山智安科技有限公司 Efficient erasure correction distributed storage writing method, system, medium and terminal
CN113094572A (en) * 2021-04-16 2021-07-09 中国工商银行股份有限公司 Service data processing method, device and equipment
CN113220234B (en) * 2021-05-17 2022-10-21 南京林洋电力科技有限公司 Terminal data storage management method and manager
CN113793110A (en) * 2021-07-01 2021-12-14 科尔比乐(广州)智能装备有限公司 Industrial equipment data acquisition and analysis method based on cloud computing and cloud service platform
CN113610484A (en) * 2021-07-01 2021-11-05 科尔比乐(广州)智能装备有限公司 Intelligent manufacturing production execution system L-MES and manufacturing method thereof
CN114925073B (en) * 2022-06-14 2024-04-16 深圳九有数据库有限公司 Distributed database system supporting flexible dynamic fragmentation and implementation method thereof
CN116915510B (en) * 2023-09-13 2023-12-01 北京数盾信息科技有限公司 Distributed storage system based on high-speed encryption algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5423044A (en) * 1992-06-16 1995-06-06 International Business Machines Corporation Shared, distributed lock manager for loosely coupled processing systems
CN102855239B (en) * 2011-06-28 2016-04-20 清华大学 A kind of distributed geographical file system
CN103793442B (en) * 2012-11-05 2019-05-07 北京超图软件股份有限公司 The processing method and system of spatial data
CN105608155B (en) * 2015-12-17 2018-09-25 北京华油信通科技有限公司 Mass data distributed memory system

Also Published As

Publication number Publication date
CN106446126A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN106446126B (en) Mass spatial information data storage management method and storage management system
CN109241161B (en) Meteorological data management method
KR101677418B1 (en) Prioritizing data reconstruction in distributed storage systems
CN101334797B (en) Distributed file systems and its data block consistency managing method
CN100337218C (en) Data managing method for network storage system and network storage system constituted thereby
KR101813431B1 (en) Method and system for distributing data in a distributed storage system
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
CN105025053A (en) Distributed file upload method based on cloud storage technology and system
EP3037988A1 (en) Configuration method and device for hash database
CN103138912B (en) Method of data synchronization and system
CN109407977B (en) Big data distributed storage management method and system
CN109933312B (en) Method for effectively reducing I/O consumption of containerized relational database
CN105468296A (en) No-sharing storage management method based on virtualization platform
CN102982182A (en) Data storage planning method and device
CN113449065A (en) Data deduplication-oriented decentralized storage method and storage device
US20200341639A1 (en) Lattice layout of replicated data across different failure domains
CN104965835A (en) Method and apparatus for reading and writing files of a distributed file system
CN111708894A (en) Knowledge graph creating method
CN107943615B (en) Data processing method and system based on distributed cluster
EP4170499A1 (en) Data storage method, storage system, storage device, and storage medium
CN116389233B (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN107220003A (en) A kind of method for reading data and system
CN113485644B (en) IO data storage method and server
CN105068896A (en) Data processing method and device based on RAID backup
CN105187489A (en) File transfer method and system capable of clustering and supporting multiple users to upload simultaneously

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant