CN111897877A

CN111897877A - High-performance and high-reliability data sharing system and method based on distributed thought

Info

Publication number: CN111897877A
Application number: CN202010805266.9A
Authority: CN
Inventors: 丁怀雷; 李振
Original assignee: Inspur Software Co Ltd
Current assignee: Inspur Software Co Ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-11-06
Anticipated expiration: 2040-08-12
Also published as: CN111897877B

Abstract

The invention discloses a high-performance and high-reliability data sharing system and method based on a distributed thought, which belong to the technical field of information and solve the technical problems of low efficiency and insecurity of large-scale data sharing, difficult management of data sharing, non-intuitive sharing mode and difficult monitoring of shared content, and the technical scheme is as follows: the system comprises a database connection management unit, a timing task management unit, a task log management unit, a task statistical unit, a task monitoring unit and a failure report unit. The method comprises the following steps: s1, configuring and starting a data sharing task, regularly extracting data from a source database through a defined rule to form a data text file, compressing and encrypting the file, uploading the file to a document center, and recording an access link; s2, providing a data sharing interface for the outside, and obtaining a download link by a user through the data sharing interface; and S3, calling the download link to download the encrypted file from the document center, and decrypting the encrypted file by using the key distributed in advance to acquire data.

Description

High-performance and high-reliability data sharing system and method based on distributed thought

Technical Field

The invention relates to the technical field of information, mainly relates to a high-efficiency and safe data extraction and data sharing technology among multiple information systems, and particularly relates to a high-performance and high-reliability data sharing system and method based on a distributed idea.

Background

At present, the informatization construction of each enterprise reaches a certain scale, a large amount of data with extremely high value is precipitated, the data is an important asset of the enterprise and is a soul on which the enterprise lives and develops persistently, and various reasons exist among system developers (hereinafter referred to as developers) to cause that the data resources are monopolized by the developers. Analysis of data within an enterprise requires some data to be opened between different developers to coordinate. In order to enable each developer to conveniently and quickly share data, currently, an enterprise generally shares data in three ways:

the first mode is as follows: enterprises generally need to coordinate the sharing of data by various developers in the form of interfaces, which can solve the problem of small data sharing, but is inelegant for data with larger data volume. For example, the common method is very elusive in the daily synchronization of national tobacco industry retail terminal sales order data (about ten million orders per day) to a tobacco marketing analysis system. Moreover, the interfaces are generally developed by each developer and owned by the developer, and enterprises generally cannot manage and monitor shared data in real time.

The second mode is as follows: enterprises need to coordinate developers and open read-only rights of the database to achieve the purpose of data sharing, and the method needs to disclose detailed design of a database table structure so that other manufacturers can extract data needed by the enterprises. But this approach is prone to divulging the underlying design of the developer's database, not only insecure but also unfair to developers participating in data sharing.

The third mode is as follows: enterprises need to coordinate developers, and adopt existing data extraction tools such as ETL and the like to extract data needing to be shared into a single shared database. The mode has large workload, consumes a large amount of human resources and is unsafe and unintuitive.

In summary, the prior art has the following problems: the internal data sharing of enterprises is difficult for a long time, large-scale data is difficult to share, the traditional sharing mode is low in efficiency and unsafe, and meanwhile, the data sharing is difficult to manage, the sharing mode is not visual, and the sharing content is difficult to monitor.

Disclosure of Invention

The technical task of the invention is to provide a high-performance and high-reliability data sharing system and method based on a distributed thought, so as to solve the problems of low efficiency and insecurity of large-scale data sharing, difficult management of data sharing, non-intuitive sharing mode and difficult monitoring of shared content.

The technical task of the present invention is achieved in that a high-performance and high-reliability data sharing system based on a distributed idea, which comprises,

the database connection management unit is used for newly adding a source database to be shared and modifying or deleting database connection;

the timing task management unit is used for adding, modifying, starting, stopping, deleting and manually compensating timing tasks;

the task log management unit is used for checking the detailed execution process of each historical task, the execution process of each historical task and the storage position of the generated encrypted data file in the document center, and simultaneously providing SQL checking, log deleting and high-level operation;

the task counting unit is used for counting the execution process of each task;

the task monitoring unit is used for monitoring any task in real time, checking and analyzing the running state of the task in a visual chart mode, and providing all-around data sharing task monitoring information for an administrator;

and the failure report unit is used for generating a report for the failed task, and the administrator can visually see the links of the failed task which have problems.

Preferably, the database connection management unit includes,

the database connection adding module is used for adding a source database to be shared through a database connection management function, providing a test connection button, testing the connectivity of the database at any time and managing the connection of various databases;

and the database connection modification and deletion module is used for modifying or deleting the existing database connection at any time.

Preferably, the newly added content of the source database to be shared includes a connection name, a connection address, a connection port number, a database name, a database user name, a database password, and a database type.

Preferably, the timed task management unit includes,

the timing task newly-adding module is used for newly adding a timing task through a newly-added function of timing task management;

the timing task modification module is used for modifying the existing data sharing timing task;

the timing task starting, stopping or deleting module is used for carrying out corresponding starting, stopping or deleting operation on the existing data sharing timing task;

and the manual compensation task module is used for rapidly and freely performing manual compensation on the historical data.

Preferably, the information to be maintained by the newly added timing task includes a task name, a convention table name, a connection name, a cron expression, a task step size, an extraction time format, extraction start time, a time offset, a key type, an SQL type, a task SQL and a timestamp field full name;

the appointed table name is used for protecting a strategy designed by the bottom layer design of a developer, and the developer fills the field randomly;

the cron expression refers to a time instruction expression commonly used in a linux operating system;

the task step length refers to how long data are required to be taken from a source database when the task is executed, and the data are measured in minutes;

the time offset refers to how long the task needs to be shifted forward when the task step length is selected, and is used for solving the problem of data storage delay when a database transaction is used for processing complex services;

the key type refers to that different public and private key pairs can be selected for encryption when the file is encrypted;

the SQL type allows a system administrator to tell the system a parameter filling strategy when the system executes the SQL task, and the parameter filling strategy specifically comprises an automatic splicing parameter and a manual definition parameter;

the task SQL refers to a structured query language of a data sharing task;

the full name of the timestamp field is used for solving the problem that the name of each developer timestamp field is various, and is also used for indicating which field in the task sql represents the timestamp.

Preferably, the task log management unit includes,

the task log clearing module is used for clearing all histories of the selected tasks and clearing unnecessary history records;

the SQL checking module is used for checking the specific execution SQL of the selected task and assisting in analyzing problems in the data sharing process;

the log deleting module is used for deleting the task logs one by one;

and the high-level operation module is used for providing higher-level operation for log management.

Preferably, the content counted by the task counting unit comprises the number of shared data lines, the size of a shared data text file, the size of a shared data encryption file and the time length of connecting a database.

Preferably, the data sharing system provides two operation modes, specifically as follows:

firstly, a single Master mode: the data sharing system works on one server and can automatically simulate a plurality of Slave sub-nodes in a multithreading mode;

② Master-Slave mode: the data sharing system is manually split into a Master node and a Slave node, wherein the Master node works on one server, the Slave node works on other servers, and MQ message middleware is adopted for communication between the Master node and the Slave node.

A high-performance and high-reliability data sharing method based on a distributed idea comprises the following specific steps:

s1, a system administrator configures and starts a data sharing task, the task periodically extracts data from a source database through a defined rule to form a data text file, compresses and encrypts the file, uploads the file to a document center (an enterprise internal unstructured database management system) and records an access link;

s2, the data sharing system provides a data sharing interface for the outside, and a user acquires a download link through the data sharing interface;

and S3, calling the download link to the document center to download the encrypted file, and decrypting the encrypted file by using the key distributed in advance to acquire data.

Preferably, the data sharing system can automatically split a sharing task with huge data volume according to the time dimension according to the task step length set by an administrator, and the sharing task is dispersedly executed, so that the pressure of a source database is reduced;

the data sharing system obtains data from a source database to form a text file, encrypts the text file through an RSA asymmetric encryption algorithm to form an encrypted file, and transmits the encrypted file to a document center for storage, and a user needing to share the data directly downloads the data from the document center through a link, so that the pressure of the data sharing system is transferred;

the user needing to share data needs to acquire target data in two steps, which is specifically as follows:

firstly, calling an interface provided by a data sharing system;

and secondly, calling an interface of the document center according to the result obtained in the first step to obtain the encrypted file.

The high-performance and high-reliability data sharing system and method based on the distributed thought have the following advantages:

the invention provides a brand-new solution for enterprise internal data sharing, breaks through the barrier of difficult traditional data sharing, has very visual sharing mode, high efficiency, safety and reliability in the sharing process, can realize convenient, high-efficiency management and sharing tasks and can realize real-time monitoring of the data sharing tasks, thereby saving a large amount of human resource cost and providing powerful support for enterprise internal data sharing, particularly for rapid sharing of super-large-scale data;

the method can effectively solve the problem of internal data sharing of enterprises, particularly the problem of large-scale data sharing, can conveniently and quickly manage data sharing tasks, efficiently and safely share the large-scale data, and provides a safe and efficient solution for internal data sharing of the enterprises, particularly sharing of a large amount of data;

thirdly, in order to reduce the access pressure to a source database in the data summarization process and avoid influencing the normal operation of other service systems, a large number of data sharing tasks are divided into N times of small data sharing tasks, the data sharing tasks can be connected with the database through a database connection pool, a data extraction task is executed every other corresponding time, data with step length (time length) is extracted once, and the extracted data are compressed and encrypted and then uploaded to a document center for storage;

(IV) Pentium ma ning has the following characteristics besides the characteristics of common data extraction and sharing tools:

safety: the calling of the data sharing interface can be realized only by the authority granted to a developer by the platform, so that the safety of the interface is ensured; the data file is stored in a document center in an RSA encryption mode, and a developer must apply for obtaining an RSA private key to decrypt the obtained data file;

high performance: by adopting a distributed idea, based on a Master-Slave mode (one-Master multi-Slave mode), a Master node is only used for generating and dispatching tasks, and Slave nodes are used for processing the tasks. The number of the slave nodes can be multiple, and large-scale data extraction and sharing tasks can be processed simultaneously;

high availability: the main node dispatching task is stored in message middleware (MQ for short) and the slave nodes have a plurality of nodes and are deployed in different machines, so that the high availability of the system is guaranteed;

fourthly, convenient management: by the system, a system administrator can add a data sharing task at any time, modify a data extraction rule at any time, start the data sharing task at any time, stop the data sharing task at any time and delete the data sharing task at any time;

monitoring and counting: the invention can check the running condition of the task in real time and monitor the success and failure of the task; meanwhile, providing various dimensional task statistical reports and visual charts, and mastering the running condition of the tasks in real time;

sixthly, compensation can be carried out: aiming at the failed task, the invention has the functions of automatic compensation and manual compensation, when the task failure is detected, a compensation mechanism is started immediately to compensate the failed task, and meanwhile, the administrator can also perform compensation manually, so that the operation is convenient and quick;

lowering the pressure of the source database: the invention provides the capability of splitting the same data sharing task, divides the data sharing task into n small tasks, realizes time-sharing, and reads data from a database in a segmented manner so as to avoid pressure on a source database and improve the data extraction and sharing speed;

lowering the voltage of the data sharing system: and processing the data acquired from the source database and then sending the processed data to the document center for storage, and directly downloading the data from the document center through a link by a system developer needing to share the data, thereby transferring the pressure of the data sharing system.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block diagram of a high-performance and high-reliability data sharing system based on a distributed concept;

FIG. 2 is an architecture diagram of the working principle of the data sharing system of FIG. 1;

FIG. 3 is an interface screenshot of a newly added database connection;

FIG. 4 is a first interface screenshot of a newly added timed task module;

FIG. 5 is a second interface screenshot of a newly added timing task module;

FIG. 6 is a first interface screenshot of a task log management unit;

FIG. 7 is a second interface screenshot of a task log management unit;

FIG. 8 is an interface screenshot of a task statistics unit;

FIG. 9 is an interface screenshot of a line graph of a task monitoring unit;

FIG. 10 is an interface screenshot of a pie chart of a task monitor unit;

FIG. 11 is an interface screenshot of a failure report.

Detailed Description

The high-performance and high-reliability data sharing system and method based on the distributed idea of the present invention are described in detail below with reference to the drawings and the embodiments of the specification.

Example 1:

as shown in fig. 1, the high-performance and high-reliability data sharing system based on the distributed concept of the present invention includes,

the database connection management unit is used for newly adding a source database to be shared and modifying or deleting database connection; the database connection management unit includes a database connection management unit,

a database connection adding module, configured to add a source database to be shared through a database connection management function, and provide a test connection button to test connectivity of the database at any time and manage connections of multiple databases, as shown in fig. 3;

The newly added content of the source database to be shared comprises a connection name, a connection address, a connection port number, a database name, a database user name, a database password and a database type.

The timing task management unit is used for adding, modifying, starting, stopping, deleting and manually compensating timing tasks; the timed task management unit comprises a timed task management unit,

the timing task newly-adding module is used for newly adding a timing task through a newly-added function of timing task management; the information to be maintained by the newly added timing task comprises a task name, a convention table name, a connection name, a cron expression, a task step size, an extraction time format, extraction starting time, a time offset, a key type, an SQL type, a task SQL and a timestamp field full name, as shown in attached figures 4 and 5;

the task SQL refers to a structured query language of a data sharing task;

A task log management unit for viewing detailed execution process of each historical task, execution process of each historical task and storage position of generated encrypted data file in the document center, and providing viewing SQL, deleting log and high-level operation, as shown in FIGS. 6 and 7; the task log managing unit includes a task log managing unit,

the log deleting module is used for deleting the task logs one by one;

A task counting unit, configured to count the execution processes of each task, as shown in fig. 8; the content counted by the task counting unit comprises the number of shared data lines, the size of a shared data text file, the size of a shared data encryption file and the time length for connecting a database.

The task monitoring unit is used for monitoring any task in real time, checking and analyzing the running state of the task in a visual chart mode, and providing all-around data sharing task monitoring information for an administrator, as shown in the attached figures 9 and 10;

and the failure reporting unit is used for generating a report for the failed task, and the administrator visually sees the links of the failed task, as shown in fig. 11.

As shown in fig. 2, the data sharing system provides two operation modes, which are as follows:

The working process of the data sharing system is as follows:

(1) a system administrator logs in and newly adds a database of data to be shared through a database management function;

(2) configuring a timing task and a data sharing rule and starting a task;

(3) the data sharing system extracts data from a source database at regular time according to the configuration of a system administrator to form a data text file through a defined rule;

(4) after the data text file is formed, the data sharing system can archive and compress the text files and encrypt the text files through an RSA asymmetric algorithm to form an encrypted file, and then the system can upload the encrypted file to an unstructured database management system (document center for short) in an enterprise and record an access link;

(4) the data sharing system provides a data sharing interface for the outside, and a developer can acquire the downloading link through the data sharing system interface;

(5) and calling the link to download the encrypted file from the document center, and decrypting the encrypted file through a key distributed in advance to acquire data.

The invention is suitable for retail industry.

Example 2:

the invention discloses a high-performance and high-reliability data sharing method based on a distributed idea, which comprises the following specific steps:

The data sharing system can automatically split a sharing task with huge data volume according to the time dimension according to the task step length set by an administrator, and the sharing task is dispersedly executed, so that the pressure of a source database is reduced;

firstly, calling an interface provided by a data sharing system;

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A high-performance and high-reliability data sharing system based on a distributed idea is characterized in that the data sharing system comprises,

the task counting unit is used for counting the execution process of each task;

2. A high-performance high-reliability data sharing system based on distributed thought according to claim 1, wherein the database connection managing unit includes,

3. The distributed concept-based high-performance high-reliability data sharing system according to claim 2, wherein the newly added content of the source database to be shared includes a connection name, a connection address, a connection port number, a database name, a database user name, a database password, and a database type.

4. A high performance high reliability data sharing system based on distributed thought according to claim 1 wherein the timed task management unit includes,

5. The distributed thought-based high-performance high-reliability data sharing system according to claim 4, wherein the information to be maintained by the newly added timing task includes a task name, a contract table name, a connection name, a cron expression, a task step size, an extraction time format, extraction start time, a time offset, a key type, an SQL type, a task SQL, and a timestamp field full name;

the task SQL refers to a structured query language of a data sharing task;

6. A high-performance high-reliability data sharing system based on distributed thought according to claim 1, wherein the task log managing unit includes,

the log deleting module is used for deleting the task logs one by one;

7. The distributed concept-based high-performance high-reliability data sharing system according to claim 1, wherein the statistics of the task statistics unit include the number of data lines to be shared, the size of the text file of the shared data, the size of the encrypted file of the shared data, and the length of time for connecting the database.

8. The distributed concept based high-performance and high-reliability data sharing system according to claim 1, wherein the data sharing system provides two operation modes, specifically as follows:

9. A high-performance and high-reliability data sharing method based on a distributed idea is characterized by comprising the following steps:

s1, a system administrator configures and starts a data sharing task, the task periodically extracts data from a source database through a defined rule to form a data text file, and the data text file is compressed, encrypted and uploaded to a document center and an access link is recorded;

10. The distributed thought-based high-performance high-reliability data sharing method according to claim 9, wherein the data sharing system is capable of automatically splitting, performing dispersedly, and reducing the pressure of a source database a shared task with a huge data amount according to a task step set by an administrator;

firstly, calling an interface provided by a data sharing system;