CN107301094A - The dynamic self-adapting data model inquired about towards extensive dynamic transaction - Google Patents
The dynamic self-adapting data model inquired about towards extensive dynamic transaction Download PDFInfo
- Publication number
- CN107301094A CN107301094A CN201710325734.0A CN201710325734A CN107301094A CN 107301094 A CN107301094 A CN 107301094A CN 201710325734 A CN201710325734 A CN 201710325734A CN 107301094 A CN107301094 A CN 107301094A
- Authority
- CN
- China
- Prior art keywords
- data
- workload
- processing
- dynamic
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1443—Transmit or communication errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to dynamic self-adapting data model construction method when being inquired about towards extensive dynamic transaction, comprise the following steps:Data are collected in real time from the data sources such as console, RPC, text, tail, log system, exec;When high-throughput, the speed of data acquisition and data processing in regulation real-time scene, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;Each data library inquiry request in workload is handled, effective partition information is extracted, obtains real-time data model;The data in workload are persistently handled, the quantity of processing unit can dynamically be adjusted according to the scale of workload, and parallel processing can be achieved in multiple processing units;Distributed file system is write the result into, MySQL database is stored in.Present invention uses streaming framework, the reasonable distribution resource in distributed type assemblies is improved on robustness.
Description
Technical field
The present invention relates to the dynamic self-adapting data model construction method inquired about towards extensive dynamic transaction, more particularly to
The dynamic self-adapting data model constructing system inquired about towards extensive dynamic transaction.
Background technology
,, should between user and application along with quickly generating for mass data towards under the cloud computing environment of big data
It is more and more frequent with interacting between application.User's request shows the characteristics of personalization, real time implementation.Therefore, large-scale OLAP
(On-Line Analytical Processing) and OLTP (On-Line Transaction Processing) application need
Workload is handled immediately.
The content of the invention
The technical problems to be solved by the invention are the dynamic self-adapting data models inquired about towards extensive dynamic transaction
Method and the system realization based on Storm streaming frameworks.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:The dynamic inquired about towards extensive dynamic transaction is adaptive
Data model construction method is answered, is comprised the following steps:
Step 1:Data are collected in real time from the data sources such as console, RPC, text, tail, log system, exec;
Step 2:When high-throughput, the speed of data acquisition and data processing, drop in regulation real-time scene
Low system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
Step 3:Each data library inquiry request in workload is handled, effective subregion letter is extracted
Breath, obtains real-time data model;
Step 4:The data in workload are persistently handled, the quantity of processing unit can be dynamic according to the scale of workload
State is adjusted, multiple processing units, and parallel processing can be achieved;
Step 5:Distributed file system is write the result into, MySQL database is stored in.
The beneficial effects of the invention are as follows:Propose the moving towards the inquiry of extensive dynamic transaction being combined with streaming framework
State self-adapting data model building method, is expanded by building incidence matrix map sub-region information, and using the level of streaming framework
Exhibition mechanism realizes high scalability and high-throughput adaptability.Test result indicates that the algorithm is for big rule under big data environment
Mould, Dynamic workload carry out the effective means of real time data subregion.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, square is associated calculating
Battle array M each attribute pair between the degree of association when, the calculating of every a line is assigned in the different computing units of streaming framework simultaneously
Perform, then all intermediate results are added and obtain final result together.
It is that time complexity has been reduced to O (1) using the beneficial effect of above-mentioned further scheme, so as to improve data partition
The execution efficiency of algorithm.
Further, dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, including data
AM access module, handling capacity adjustment module, data processing module, horizontal extension module and data memory module;
The data access module, collection stream data and adaptation high-throughput.From console, RPC, text, tail,
Data are collected in real time in the data sources such as log system, exec, and real time data is provided for the further processing of streaming framework;
The handling capacity adjustment module, in big data streaming computing environment, acquisition speed and data processing speed
Not necessarily synchronous, when high-throughput, handling capacity adjustment module can adjust data acquisition and number in real-time scene
According to the speed of processing, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
The data processing module, is handled each data library inquiry request in workload, and obtain reality
When data model, the workload of input is pre-processed, effective partition information is extracted;There are multiple processing units,
Parallel processing can be achieved, time complexity is reduced;
In the case of the horizontal extension module, big data, data scale has exceeded the disposal ability of unit, in face of extensive
Load, horizontal extension module can neatly carry out horizontal extension by increasing processing unit, increase algorithm degree of parallelism, reduction
Algorithm complex;
The data memory module, by division result persistence, distributed file system is write by division result, is stored in
MySQL database, according to these real-time results, is calculated for further studying.
Using the beneficial effect of above-mentioned further scheme solved under big data environment, towards extensive, dynamic, unknown
Workload carries out the timeliness sex chromosome mosaicism of data modeling, it is necessary to which data model constructing technology is combined with streaming computing framework,
Propose a set of data model constructing plan and related system based on streaming framework.
Further, dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, its feature exists
In:
1) dynamic self-adapting data model is built:Partitioning strategies generate with dynamic update module, each data processing it
Enter Mobile state renewal to partitioning strategies afterwards;
2) fault-tolerant management:Using the fault-tolerant verification scheme of streaming framework, realize that fault-tolerant management is real for example with Kafka
These flow datas, when mistake occurs in data handling procedure, are preserved a period of time by existing data playback in systems, in order to from
Some point starts to re-start transmission;
3) reliability:Data access module dynamically crawl data, and being adjusted by handling capacity, it is ensured that in the case of high-throughput
The stability of system processing.Handling capacity adjustment module realizes the processing to unknown data by dispatching adaptation and load balancing,
Mobile state adjustment can be entered to data model with the change of workload;
4) horizontal extension:Horizontal extension module growth data processing unit when in face of extensive, dynamic load, realizes system
The high scalability and high availability of system.
Brief description of the drawings
Fig. 1 is the inventive method flow chart of steps;
Fig. 2 is apparatus of the present invention structure chart.
Description of reference numerals:1-data access module;2-handling capacity adjustment module;3-data processing module;4-water
Flat expansion module;5-data memory module.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and
It is non-to be used to limit the scope of the present invention.
As shown in figure 1, being the inventive method flow chart of steps;Fig. 2 is apparatus of the present invention structure chart.
Embodiment 1
Dynamic self-adapting data model construction method when being inquired about towards extensive dynamic transaction, comprises the following steps:
Step 1:The collection of data is realized with Flume.Flume is a distribution of Cloudera offers, reliable and height
The data gathering system of available massive logs collection, polymerization and transmission, it can be from continuous collecting number in different data sources
According to.A Data Generator is built, journal file is generated in real time, data acquisition is carried out using journal file as data source;
Step 2:Kafka is directed to the situation of high-throughput in real-time scene, and high-throughput is carried out as middleware
Regulation, is adapted to the dynamic change of load;
Step 3:Load pretreatment is carried out, partitioning algorithm is run, real time partitioned scheme is obtained.When data processing is realized,
Storm provides API, only need to customize Spout and Bolt function, and provide data flow between each Bolt
Flow direction, just can realize the real-time calculating of convection type big data by the execution of data flow operation;
The step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, calculating each of incidence matrix M
During the degree of association between attribute pair, the calculating of every a line is assigned in the different computing units of streaming framework and performed simultaneously, then
All intermediate results are added together and final result is obtained.
This stage extracts the partition information in workload, carries out statistics calculating.The input in this stage is step 1
In extensive, dynamic, unknown workload, the characteristic that streaming framework is handled in real time ensure that unknown flow data can be located in time
Reason, an incidence matrix for including partition information can be obtained through load mapping.
Step 4:Calculating task in Storm can parallel be carried out between multiple threads, process and server.In addition,
Zookeeper provides distributed coordination service, can neatly carry out horizontal extension by adding physical node.
When mass data has access to next, multiple processes can be opened on a machine, multiple physics can also be added
Node increases the quantity of processing unit, and the degree of parallelism of increase system processing realizes horizontal extension, reduce processing time;
Step 5:Data memory module is realized using MySQL database, MySQL interface is realized in Storm, will be divided
Area's result is saved in MySQL database, realizes data storage.
Dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, including data access module
1, handling capacity adjustment module 2, data processing module 3, horizontal extension module 4 and data memory module 5;
The data access module (1), collection stream data and adaptation high-throughput.From console, RPC, text,
Data are collected in real time in the data sources such as tail, log system, exec, and real-time number is provided for the further processing of streaming framework
According to;
The handling capacity adjustment module (2), in big data streaming computing environment, acquisition speed and data processing speed
Degree is not necessarily synchronous, when high-throughput, handling capacity adjustment module can adjust in real-time scene data acquisition with
The speed of data processing, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
The data processing module (3), is handled each data library inquiry request in workload, and obtain
Real-time data model, pre-processes to the workload of input, extracts effective partition information;There are multiple processing single
Member, can be achieved parallel processing, reduce time complexity;
In the case of the horizontal extension module (4), big data, data scale has exceeded the disposal ability of unit, in face of big
Scale is loaded, and horizontal extension module can neatly carry out horizontal extension by increasing processing unit, increase algorithm degree of parallelism,
Reduce algorithm complex;
The data memory module (5), by division result persistence, distributed file system is write by division result, is deposited
Storage, according to these real-time results, is calculated in MySQL database for further studying.
The step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, calculating each of incidence matrix M
During the degree of association between attribute pair, the calculating of every a line is assigned in the different computing units of streaming framework and performed simultaneously, then
All intermediate results are added together and final result is obtained.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (5)
1. it is a kind of towards extensive dynamic transaction inquire about when dynamic self-adapting data model construction method, it is characterised in that bag
Include following steps:
Step 1:Data are collected in real time from the data sources such as console, RPC, text, tail, log system, exec;
Step 2:When high-throughput, the speed of data acquisition and data processing in regulation real-time scene, reduction system
The delay of the extensive Dynamic workload of system processing, it is ensured that the stability of system;
Step 3:Each data library inquiry request in workload is handled, effective partition information is extracted, obtains
To real-time data model;
Step 4:The data in workload are persistently handled, the quantity of processing unit can dynamically be adjusted according to the scale of workload
Whole, parallel processing can be achieved in multiple processing units;
Step 5:Distributed file system is write the result into, MySQL database is stored in.
2. it is according to claim 1 towards extensive dynamic transaction inquire about when dynamic self-adapting data model structure side
Method, it is characterised in that:Step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, calculating incidence matrix M's
During the degree of association between each attribute pair, the calculating of every a line is assigned in the different computing units of streaming framework and performed simultaneously,
All intermediate results are added and obtain final result together again.
3. according in claim 1 to 2 it is any it is described towards extensive dynamic transaction inquire about when dynamic self-adapting data mould
Type construction method, it is characterised in that:Dynamic increment updates;Handle unknown workload;In real time processing, using streaming framework and
Row computing mechanism improves execution efficiency.Horizontal extension and high-throughput adaptability, WSPA is by algorithm process and streaming framework knot
Close, the horizontal extension mechanism having using streaming framework, processing is extensive, Dynamic workload when, addition can be passed through
Physical node neatly realizes horizontal extension in addition, by being combined with data access component, and such as Flume and Kafka can
To realize in the case of the workload in face of high-throughput, algorithm still has good performance.
4. dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, it is characterised in that:Including number
According to AM access module (1), handling capacity adjustment module (2), data processing module (3), horizontal extension module (4) and data memory module
(5);
The data access module (1), collection stream data and adaptation high-throughput.From console, RPC, text, tail, day
Data are collected in real time in the data sources such as aspiration system, exec, and real time data is provided for the further processing of streaming framework;
The handling capacity adjustment module (2), in big data streaming computing environment, acquisition speed and data processing speed are not
Certain synchronous, when high-throughput, handling capacity adjustment module can adjust data acquisition and data in real-time scene
The speed of processing, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
The data processing module (3), is handled each data library inquiry request in workload, and obtain in real time
Data model, the workload of input is pre-processed, effective partition information is extracted;There are multiple processing units, can
Parallel processing is realized, time complexity is reduced;
In the case of the horizontal extension module (4), big data, data scale has exceeded the disposal ability of unit, in face of extensive
Load, horizontal extension module can neatly carry out horizontal extension by increasing processing unit, increase algorithm degree of parallelism, reduction
Algorithm complex;
The data memory module (5), by division result persistence, distributed file system is write by division result, is stored in
MySQL database, according to these real-time results, is calculated for further studying.
5. it is according to claim 4 towards extensive dynamic transaction inquire about when dynamic self-adapting data model build system
System, it is characterised in that:
1) dynamic self-adapting data model is built:Partitioning strategies is generated and dynamic update module, right after each data processing
Partitioning strategies enters Mobile state renewal;
2) fault-tolerant management:Using the fault-tolerant verification scheme of streaming framework, realize that fault-tolerant management realizes data for example with Kafka
Reset, when mistake occurs in data handling procedure, these flow datas are preserved into a period of time in systems, in order to from some point
Start to re-start transmission;
3) reliability:Data access module dynamically crawl data, and being adjusted by handling capacity, it is ensured that system in the case of high-throughput
The stability of processing.Handling capacity adjustment module realizes the processing to unknown data by dispatching adaptation and load balancing, can be with
As Mobile state adjustment is entered in the change of workload to data model;
4) horizontal extension:Horizontal extension module growth data processing unit when in face of extensive, dynamic load, realizes system
High scalability and high availability.The comfortable indicating strip of infant-wear according to claim 1, it is characterised in that the sign
Color with internal layer is deeper than the color of outer layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710325734.0A CN107301094A (en) | 2017-05-10 | 2017-05-10 | The dynamic self-adapting data model inquired about towards extensive dynamic transaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710325734.0A CN107301094A (en) | 2017-05-10 | 2017-05-10 | The dynamic self-adapting data model inquired about towards extensive dynamic transaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107301094A true CN107301094A (en) | 2017-10-27 |
Family
ID=60137069
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710325734.0A Pending CN107301094A (en) | 2017-05-10 | 2017-05-10 | The dynamic self-adapting data model inquired about towards extensive dynamic transaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107301094A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108121645A (en) * | 2017-12-25 | 2018-06-05 | 深圳市分期乐网络科技有限公司 | A kind of daily record method for evaluating quality, device, server and storage medium |
CN109271395A (en) * | 2018-09-11 | 2019-01-25 | 南京轨道交通系统工程有限公司 | Extensive real time data for comprehensive monitoring system updates delivery system and method |
CN109327329A (en) * | 2018-08-31 | 2019-02-12 | 华为技术有限公司 | Data model update method and device |
CN112685403A (en) * | 2019-10-18 | 2021-04-20 | 上海同是科技股份有限公司 | High-availability framework system for hidden danger troubleshooting data storage and implementation method thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103747060A (en) * | 2013-12-26 | 2014-04-23 | 惠州华阳通用电子有限公司 | Distributed monitor system and method based on streaming media service cluster |
CN103853844A (en) * | 2014-03-24 | 2014-06-11 | 南开大学 | Hadoop-based relation table nonredundant key set identification method |
US20160105352A1 (en) * | 2014-10-09 | 2016-04-14 | Fujitsu Limited | File system, control program of file system management device, and method of controlling file system |
CN106446126A (en) * | 2016-09-19 | 2017-02-22 | 哈尔滨航天恒星数据系统科技有限公司 | Massive space information data storage management method and storage management device |
-
2017
- 2017-05-10 CN CN201710325734.0A patent/CN107301094A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103747060A (en) * | 2013-12-26 | 2014-04-23 | 惠州华阳通用电子有限公司 | Distributed monitor system and method based on streaming media service cluster |
CN103853844A (en) * | 2014-03-24 | 2014-06-11 | 南开大学 | Hadoop-based relation table nonredundant key set identification method |
US20160105352A1 (en) * | 2014-10-09 | 2016-04-14 | Fujitsu Limited | File system, control program of file system management device, and method of controlling file system |
CN106446126A (en) * | 2016-09-19 | 2017-02-22 | 哈尔滨航天恒星数据系统科技有限公司 | Massive space information data storage management method and storage management device |
Non-Patent Citations (1)
Title |
---|
康宏 等: "应用驱动的基于流式框架的实时数据分区算法", 《计算机应用研究》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108121645A (en) * | 2017-12-25 | 2018-06-05 | 深圳市分期乐网络科技有限公司 | A kind of daily record method for evaluating quality, device, server and storage medium |
CN109327329A (en) * | 2018-08-31 | 2019-02-12 | 华为技术有限公司 | Data model update method and device |
CN109327329B (en) * | 2018-08-31 | 2021-11-09 | 华为技术有限公司 | Data model updating method and device |
CN109271395A (en) * | 2018-09-11 | 2019-01-25 | 南京轨道交通系统工程有限公司 | Extensive real time data for comprehensive monitoring system updates delivery system and method |
CN112685403A (en) * | 2019-10-18 | 2021-04-20 | 上海同是科技股份有限公司 | High-availability framework system for hidden danger troubleshooting data storage and implementation method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105550323B (en) | Load balance prediction method and prediction analyzer for distributed database | |
CN107124394B (en) | Power communication network security situation prediction method and system | |
CN106534318B (en) | A kind of OpenStack cloud platform resource dynamic scheduling system and method based on flow compatibility | |
CN107391719A (en) | Distributed stream data processing method and system in a kind of cloud environment | |
CN105117497B (en) | Ocean big data principal and subordinate directory system and method based on Spark cloud network | |
CN110222029A (en) | A kind of big data multidimensional analysis computational efficiency method for improving and system | |
CN108170530B (en) | Hadoop load balancing task scheduling method based on mixed element heuristic algorithm | |
CN107301094A (en) | The dynamic self-adapting data model inquired about towards extensive dynamic transaction | |
CN106708989A (en) | Spatial time sequence data stream application-based Skyline query method | |
CN108804602A (en) | A kind of distributed spatial data storage computational methods based on SPARK | |
CN103188346A (en) | Distributed decision making supporting massive high-concurrency access I/O (Input/output) server load balancing system | |
CN102929989B (en) | The load-balancing method of a kind of geographical spatial data on cloud computing platform | |
CN104104621B (en) | A kind of virtual network resource dynamic self-adapting adjusting method based on Nonlinear Dimension Reduction | |
CN103401939A (en) | Load balancing method adopting mixing scheduling strategy | |
CN115134371A (en) | Scheduling method, system, equipment and medium containing edge network computing resources | |
CN110659278A (en) | Graph data distributed processing system based on CPU-GPU heterogeneous architecture | |
CN109034386A (en) | A kind of deep learning system and method based on Resource Scheduler | |
CN112948123B (en) | Spark-based grid hydrological model distributed computing method | |
CN110245135A (en) | A kind of extensive streaming diagram data update method based on NUMA architecture | |
CN112035995B (en) | Unstructured grid tidal current numerical simulation method based on GPU computing technology | |
CN102420850A (en) | Resource scheduling method and system | |
CN106980540A (en) | A kind of computational methods of distributed Multidimensional Discrete data | |
CN103281393A (en) | Load balancing method of aircraft distributed system stimulation | |
CN107257356B (en) | Social user data optimal placement method based on hypergraph segmentation | |
CN114077492B (en) | Prediction model training and prediction method and system for cloud computing infrastructure resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171027 |
|
WD01 | Invention patent application deemed withdrawn after publication |