CN113722280A

CN113722280A - Storage analysis method for massive power network big data

Info

Publication number: CN113722280A
Application number: CN202110934464.XA
Authority: CN
Inventors: 谢洪潮; 朱家禄; 武明虎; 赵楠; 施阳
Original assignee: Shenglong Electric Group Co Ltd
Current assignee: Shenglong Electric Group Co Ltd
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2021-11-30

Abstract

The invention discloses a storage analysis method for massive power network big data, which comprises the steps of firstly, analyzing and classifying the big data, secondly, displaying and reminding abnormal index data, thirdly, establishing a first buffer layer and storing structured data, fourthly, establishing a second buffer layer and storing semi-structured and unstructured data, fifthly, establishing a panoramic database based on Neo4j, sixthly, establishing a data partition management module, and seventhly, establishing a high-efficiency index method; according to the invention, the first buffer layer and the second buffer layer are established to buffer and store massive big data, the characteristics of a distributed storage model are utilized, a plurality of servers are used for sharing storage load, the storage performance is improved, the distributed file system HDFS and the data partition management module are used for effectively managing the massive big data in a distributed storage and multidirectional partition mode, the response time of storage and later inquiry is greatly shortened by establishing the high-efficiency index method, and the data management difficulty is reduced.

Description

Storage analysis method for massive power network big data

Technical Field

The invention relates to the technical field of big data of a power network, in particular to a storage analysis method for massive big data of the power network.

Background

The rapid development and wide application of information technology enable power production enterprises, transaction departments and users to accumulate a large amount of data by using the Internet of things and the Internet, and along with the continuous expansion of the scale and range of database application, the power management departments and related enterprises can enhance the transaction management capacity of computers to generate huge large-scale data sets, the acquisition and storage of the huge data sets on a server are very complicated, and many original data acquisition algorithms can obtain good acquisition effects when the data sets are small in scale, but aiming at the large-scale data sets, the calculation amount is very complicated, and the acquisition and storage are very troublesome;

the existing traditional technology has the data processing capacity that the large electric power data are stored in a unified mode, so that the load of a memory is increased, a large amount of electric power data cannot be reasonably planned and sorted, and meanwhile, a simple index causes great trouble for later management and query.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a storage analysis method for massive power network big data, the storage analysis method for the massive power network big data is characterized in that a first buffer layer and a second buffer layer are established to buffer and store the massive big data, a plurality of servers are used for sharing storage load by utilizing the characteristics of a distributed storage model, the storage performance is improved, a distributed file system HDFS and a data partition management module are used for performing distributed storage and multidirectional partition effective management on the massive big data, a global index is established again on the basis of establishing an index for local data by establishing an efficient index method, the response time of storage and later inquiry is greatly shortened, and the data management difficulty is reduced.

In order to realize the purpose of the invention, the invention is realized by the following technical scheme: a storage analysis method for massive power network big data comprises the following steps:

the method comprises the following steps that firstly, a distributed storage model is used for temporarily storing massive power network big data acquired from a client, and the big data are subjected to encryption processing after being preliminarily analyzed and classified into structured data, semi-structured data and unstructured data;

step two, after preliminarily analyzing the collected data, extracting some abnormal index data, independently exporting and comparing the extracted abnormal index data, and outputting and displaying the abnormal index data;

step three, establishing a first buffer layer based on the distributed file system HDFS, and storing the encrypted structured data into a Hive data warehouse based on the HDFS;

step four, establishing a second buffer layer based on the distributed file system HDFS, and storing the encrypted loose semi-structured data and unstructured data into a distributed database HBase based on the HDFS;

step five, constructing a panoramic database of the power grid based on Neo4j, establishing an equipment mapping table according to the power network topology, and orderly integrating the dispersed and isolated massive structured data, semi-structured data and unstructured data in the HDFS;

step six, establishing a data partition management module in the panoramic database, and dividing the data partition management module into horizontal partition management and vertical partition management to carry out internal arrangement;

and step seven, establishing a high-efficiency index method based on a double-layer index model, and providing access service for storage query based on an internet cloud platform.

The further improvement lies in that: in the distributed storage model in the first step, a plurality of servers are used for sharing storage load, and meanwhile, the storage performance of the distributed storage model is further improved through expansion in the later period.

The further improvement lies in that: and in the second step, warning is timely sent out while the abnormal index data is displayed, related personnel are reminded to pay close attention to the abnormal index data, and the abnormal index data is tracked and timely processed if necessary.

The further improvement lies in that: the distributed file system HDFS in the third step and the fourth step is a high fault-tolerance distributed file system, is deployed on a cheap machine to provide high-throughput data access, and is suitable for being applied to large-scale data sets.

The further improvement lies in that: the first buffer layer in the third step is a diversified storage medium cache structure based on a DRAM and a solid state disk, access data are divided into read data and write data according to an access mode, the DRAM stores the read data and also stores the write data, the solid state disk only stores the write data, and the write data are stored in the DRAM and simultaneously stored in the solid state disk.

The further improvement lies in that: and the second buffer layer in the fourth step is based on a storage cache structure of the parallel execution engine, and caches the loose semi-structured data and the unstructured data, so that the storage load of the distributed database HBase is reduced.

The further improvement lies in that: the parallel execution engine divides one storage or reading operation into a plurality of mutually independent storage or reading operations and executes the operations, and summarizes results after the execution is finished.

The further improvement lies in that: in the fifth step, Neo4j models the big data of the power network around the graph, and expresses the big data of the power field in a node space, which can traverse nodes and edges at the same speed as a traditional relational database.

The further improvement lies in that: in the sixth step, horizontal partition management and vertical partition management of the data partition management module are established to adapt to operations for storing and accessing the same record or storing and accessing the same attribute of different records, so that the operation response time is shortened.

The further improvement lies in that: the method for establishing the efficient index based on the double-layer index model in the seventh step is to establish a global index for the actual storage range of the local data on the basis of establishing the index for the local data, so as to improve the time performance of data query and insertion operation, wherein the access service for storing and querying in the sixth step is to receive the storage query request of the system and the user for the large electric power data through an internet cloud platform, and call related statements to execute the operation of storing and querying through SQL statements and a complex storage query interface.

The invention has the beneficial effects that:

(1) according to the invention, the first buffer layer and the second buffer layer are established to buffer and store massive big data, so that the storage load caused by unified storage of the big data is effectively reduced, the storage load is shared by a plurality of servers by utilizing the characteristics of a distributed storage model, and the storage performance is improved;

(2) the distributed file system HDFS and the data partition management module perform distributed storage and multidirectional partition effective management on massive big data, and response time of storage and later-stage query is greatly shortened;

(3) the method for establishing the high-efficiency index based on the double-layer index model establishes the global index again on the basis of establishing the index for the local data, the double indexes take effect simultaneously, the efficiency of data storage and query is greatly improved, and the difficulty of data management is reduced.

Drawings

FIG. 1 is a flow chart of example 1 of the present invention.

FIG. 2 is a flow chart of embodiment 2 of the present invention.

FIG. 3 is a diagram of a data classification architecture in step one of the present invention.

Fig. 4 is a diagram of memory architectures at step three and step four in embodiment 1 of the present invention.

Fig. 5 is a diagram of a step three memory architecture in embodiment 2 of the present invention.

FIG. 6 is a block diagram of a five-step data partition management module according to the present invention.

Detailed Description

In order to further understand the present invention, the following detailed description will be made with reference to the following examples, which are only used for explaining the present invention and are not to be construed as limiting the scope of the present invention.

Example 1

According to fig. 1, 3, 4 and 6, the present embodiment provides a storage analysis method for massive power network big data, including the following steps:

In the distributed storage model in the first step, a plurality of servers are used for sharing storage load, and meanwhile, the storage performance of the distributed storage model is further improved through expansion in the later period.

In the second step, warning is timely sent out while the abnormal index data is displayed to remind related personnel to pay close attention to the abnormal index data, and the abnormal index data is tracked and timely processed if necessary

The distributed file system HDFS in the third step and the fourth step is a high fault-tolerance distributed file system, is deployed on a cheap machine to provide high-throughput data access, and is suitable for being applied to large-scale data sets.

The first buffer layer in the third step is a diversified storage medium cache structure based on a DRAM and a solid state disk, access data are divided into read data and write data according to an access mode, the DRAM stores the read data and also stores the write data, the solid state disk only stores the write data, and the write data are stored in the DRAM and simultaneously stored in the solid state disk.

The second buffer layer in the fourth step is based on a storage cache structure of a parallel execution engine, the loose semi-structured data and the unstructured data are cached, and the storage load of the distributed database HBase is reduced, wherein the parallel execution engine divides one storage or reading operation into a plurality of mutually independent storage or reading operations and executes the storage or reading operations, and the results are collected after the execution is finished.

In the fifth step, Neo4j models the big data of the power network around the graph, and expresses the big data of the power field in a node space, which can traverse nodes and edges at the same speed as a traditional relational database.

In the sixth step, horizontal partition management and vertical partition management of the data partition management module are established to adapt to operations for storing and accessing the same record or storing and accessing the same attribute of different records, so that the operation response time is shortened.

The method for establishing the efficient index based on the double-layer index model in the seventh step is to establish a global index for the actual storage range of the local data on the basis of establishing the index for the local data, so as to improve the time performance of data query and insertion operation, wherein the access service for storing and querying in the seventh step is to receive the storage query request of a system and a user for the large electric power data through an internet cloud platform, and call related statements to execute the operation of storing and querying through SQL statements and a complex storage query interface.

Example 2

According to fig. 2, 3, 5 and 6, the present embodiment provides a storage analysis method for massive power network big data, including the following steps:

step three, establishing a buffer layer based on the distributed file system HDFS, storing the encrypted structured data into a Hive data warehouse based on the HDFS, and storing the encrypted loose semi-structured data and the unstructured data into a distributed database HBase based on the HDFS;

fourthly, constructing a panoramic database of the power grid based on Neo4j, establishing an equipment mapping table according to the power network topology, and orderly integrating the dispersed and isolated mass structured data, semi-structured data and unstructured data in the HDFS;

establishing a data partition management module in the panoramic database, and dividing the data partition management module into horizontal partition management and vertical partition management for internal arrangement;

and step six, establishing a high-efficiency index method based on a double-layer index model, and providing access service for storage query based on an internet cloud platform.

And in the second step, warning is timely sent out while the abnormal index data is displayed, related personnel are reminded to pay close attention to the abnormal index data, and the abnormal index data is tracked and timely processed if necessary.

The HDFS in the third step is a high fault-tolerance distributed file system, is deployed on a cheap machine to provide high-throughput data access, and is suitable for being applied to a large-scale data set.

The buffer layer in the third step is based on a storage cache structure of a parallel execution engine, the structured data, the loose semi-structured data and the unstructured data are cached, the storage load of a Hive data warehouse and a distributed database HBase is reduced, wherein the parallel execution engine divides one storage or reading operation into a plurality of mutually independent storage or reading operations and executes the storage or reading operations, and the results are collected after the execution is finished.

Neo4j in the fourth step is modeling the big data of the power network around the graph, and expresses the big data of the power domain in a node space, which can traverse nodes and edges at the same speed as a traditional relational database.

And in the fifth step, horizontal partition management and vertical partition management of the data partition management module are established to adapt to the operation of storing and accessing the same record or storing and accessing the same attribute of different records, so that the operation response time is shortened.

The efficient index establishing method based on the double-layer index model in the sixth step is to establish a global index for the actual storage range of the local data on the basis of establishing the index for the local data, so that the time performance of data query and insertion operation is improved; the access service of the storage query is to receive a storage query request of a system and a user for the power big data through an internet cloud platform, and call related statements to execute the operation of the storage query through SQL statements and a complex storage query interface.

According to the invention, the first buffer layer and the second buffer layer are established to buffer and store massive big data, so that the storage load caused by unified storage of the big data is effectively reduced, the storage load is shared by a plurality of servers by utilizing the characteristics of a distributed storage model, and the storage performance is improved;

the distributed file system HDFS and the data partition management module perform distributed storage and multidirectional partition effective management on massive big data, and response time of storage and later-stage query is greatly shortened;

the method for establishing the high-efficiency index based on the double-layer index model establishes the global index again on the basis of establishing the index for the local data, the double indexes take effect simultaneously, the efficiency of data storage and query is greatly improved, and the difficulty of data management is reduced.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A storage analysis method for massive power network big data is characterized by comprising the following steps:

2. The storage analysis method for the mass power network big data according to claim 1, characterized in that: in the distributed storage model in the first step, a plurality of servers are used for sharing storage load, and meanwhile, the storage performance of the distributed storage model is further improved through expansion in the later period.

3. The storage analysis method for the mass power network big data according to claim 1, characterized in that: and in the second step, warning is timely sent out while the abnormal index data is displayed, related personnel are reminded to pay close attention to the abnormal index data, and the abnormal index data is tracked and timely processed if necessary.

4. The storage analysis method for the mass power network big data according to claim 1, characterized in that: the distributed file system HDFS in the third step and the fourth step is a high fault-tolerance distributed file system, is deployed on a cheap machine to provide high-throughput data access, and is suitable for being applied to large-scale data sets.

5. The storage analysis method for the mass power network big data according to claim 1, characterized in that: the first buffer layer in the third step is a diversified storage medium cache structure based on a DRAM and a solid state disk, access data are divided into read data and write data according to an access mode, the DRAM stores the read data and also stores the write data, the solid state disk only stores the write data, and the write data are stored in the DRAM and simultaneously stored in the solid state disk.

6. The storage analysis method for the mass power network big data according to claim 1, characterized in that: and the second buffer layer in the fourth step is based on a storage cache structure of the parallel execution engine, and caches the loose semi-structured data and the unstructured data, so that the storage load of the distributed database HBase is reduced.

7. The storage analysis method for the mass power network big data according to claim 6, characterized in that: the parallel execution engine divides one storage or reading operation into a plurality of mutually independent storage or reading operations and executes the operations, and summarizes results after the execution is finished.

8. The storage analysis method for the mass power network big data according to claim 1, characterized in that: in the fifth step, Neo4j models the big data of the power network around the graph, and expresses the big data of the power field in a node space, which can traverse nodes and edges at the same speed as a traditional relational database.

9. The storage analysis method for the mass power network big data according to claim 1, characterized in that: in the sixth step, horizontal partition management and vertical partition management of the data partition management module are established to adapt to operations for storing and accessing the same record or storing and accessing the same attribute of different records, so that the operation response time is shortened.

10. The storage analysis method for the mass power network big data according to claim 1, characterized in that: the method for establishing the efficient index based on the double-layer index model in the seventh step is to establish a global index for the actual storage range of the local data on the basis of establishing the index for the local data, so as to improve the time performance of data query and insertion operation, wherein the access service for storing and querying in the seventh step is to receive the storage query request of a system and a user for the large electric power data through an internet cloud platform, and call related statements to execute the operation of storing and querying through SQL statements and a complex storage query interface.