CN104794186B - The acquisition method of database loads response time forecast model training sample - Google Patents
The acquisition method of database loads response time forecast model training sample Download PDFInfo
- Publication number
- CN104794186B CN104794186B CN201510171679.5A CN201510171679A CN104794186B CN 104794186 B CN104794186 B CN 104794186B CN 201510171679 A CN201510171679 A CN 201510171679A CN 104794186 B CN104794186 B CN 104794186B
- Authority
- CN
- China
- Prior art keywords
- load
- sample
- database
- page read
- bal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The acquisition method of database loads response time forecast model training sample, belongs to the sample collection method based on cluster, it includes(1)Obtain response data during each load isolated operation of database;(2)Obtain response data when database loads are run in pairs;(3)Calculate average page read times change;(4)According to average page read times change to this space clustering of bulk sample;(5)Fill sample and choose table;(6)Generate training sample.The present invention can reduce the number of samples of statistical model, and keep model accuracy and reduce model setting up cost.
Description
Technical field
It is to be applied to database loads response time forecast model the invention belongs to the sample collection method based on cluster
Train acquisition method.
Background technology
In current parallel database system, the prediction load response time is extremely important, can help DBA
Condition data storehouse parameter, the load of reasonable arrangement schedule parallel.
But due to being influenced each other between data base concurrency load(Interaction)Mechanism is extremely complex, traditional analytic type
Model sets up process complexity, and prediction effect is bad.Therefore existing literature, which is mainly, sets up statistical model, to predict the response of load
Time.Pass through sample collection, model training(Return), the step of model evaluation three complete statistical model set up.The document of this respect
Mainly there are [1] Duggan J, Cetintemel U, Papaemmanouil O, et al. Performance
Prediction for Concurrent Database Workloads[C] //Proc.of 2011 ACM SIGMOD
Conference(SIGMOD’2011). Athens, Greece, 2011:337-348
[2] Ahmad M, Aboulanaga A,Babu S, et al. Modeling and Exploiting
Query Interaction in Database Systems[C] //Proc.of the 17th Conference on
Information and Knowledge Management (CIKM’2008).Napa Valley,US,2008:183-192
[3] Ahmad M, AboulanagaA,Babu S, et al. Qshuffler: Getting the Query
Mix Right[C] //Proc. of the 24th International Conference on Data Engineering
(ICDE’2008).Cancun, Mexico,2008:1415-1417
[4] Ahmad M, Duan S, Aboulanaga A, et al. Predicting Completion Times
of Bath Query Workloads Using Interaction-aware Models and Simulation[C] //
Proc.of the 14th International Conference on Extending Database Technology
(EDBT’2011).Uppsala, Sweden,2011:449-460
[5] Ahmad M, Duan S, Aboulanaga A, et al. Interaction-aware
Scheduling of Report Generation Workloads [J].The VLDB Journal,2011,20(4):
589-615
[6] Sheikh M B, Minhas U F, Khan O Z, et al. A Bayesian Approach to
Online Performance Modeling for Database Appliances Using Gaussian Models[C]
//Proc.of8th International Conference on Autonomic Computing(ICAC’2011).
Karlsruhe, Germany,2011:121-130。
But the corresponding method of sampling of above-mentioned statistical model does not account for influencing each other between load, only by full sample space
Specific sampling or random sampling obtain sample.As database data amount increases, load running time increase, if not selected
Training sample, can cause the model training time elongated, and the cost that model is set up will become very large.
The content of the invention
Cost is set up in order to reduce model, shortens model setup time, the present invention provides a kind of collection side of training sample
Method, can be reduced model and sets up cost while model prediction accuracy is significantly reduced.
Technical scheme:The acquisition method of database loads response time forecast model training sample, including under
State content:
1st, response data during each load isolated operation of database is obtained;
When i.e. each loads q isolated operations, its response time, CPU time, logic reading number, BAL values are obtained.Wherein
BAL is the Buffer Access Latency values defined in [1], represents that Database Systems often complete a physics and read institute
The average time used, this invention simply if referred to as read average time.Buffer Access Latency values derive from document
Duggan J, Cetintemel U, Papaemmanouil O, et al. Performance Prediction for
Concurrent Database Workloads //Proc.of 2011 ACM SIGMOD Conference(SIGMOD’
2011). Athens, Greece, 2011:337-348
Load q is represented by loaded template CqThe executable database loads of generation.
Loaded template is generated by the data base querying with parameter, renewal sentence;Different inquiry, update sentence and be considered as
Different loaded templates.The different load of the parameter of same loaded template generation, is considered as same load.
2nd, response data when database loads are run in pairs is obtained;I.e. first load qiWith the second load qjOperation in pairs
When, obtain respective response time, CPU time, logic reading number, BAL values;Wherein first load qiWith the second load qjRespectively
Belong to two different loads templates(First loaded template CqiWith the second loaded template Cqj)Generation.
3rd, average page read times change is calculated;
Average page read times change is by Δ Tq_s= Tq_s-TqDefinition, Tq_sRepresent the load of some in sample s q(By bearing
Back(ing) board CqGeneration)BAL values, TqRepresent the BAL values of some load q isolated operation.
Average page read times change meets following formula simultaneously:
Wherein Δ Tq/cijRepresent some load q and another load cijIn pairs during operation, some load q BAL
Value, another load cijIt is sample sjIn by query template CCiThe load of generation;ΔTq/ciRepresent some load q with it is another
Individual load ciIn pairs during operation, some load q BAL values, another load ciIt is by query template C in sample sCiGeneration
Load;
Utilize the Δ T obtained by paired operationq/cTo calculate higher MPL(Multi Programming Level, data base set
System is maximum and line number, i.e. expression are while the number of loads that can be run)Some load q Δ T under rankq_s.Then under
Formula provides Δ Tq_sCalculating:
;
4th, according to average page read times change to this space clustering of bulk sample;
For each class some load q, in given MPL ranks(Multi Programming Level)Under, to it
All Tq_sClustered, clustering method selects Kmeans algorithms, measures as Euclidean distance.Clusters number is MPL*2.
5th, filling sample chooses table;
6th, training sample is generated.
The present invention can reduce the number of samples of statistical model, and keep model accuracy and reduce model being created as
This.
Embodiment
Embodiment:If it is q respectively to give 5 loadtypes1、q2、q3、q4、q5;MPL grades are 4, and representing simultaneously can be in number
It it is 4 according to the load number run in storehouse, current sample is s0(q1, q2, q3, q4).Wherein q1、q2、q3、q4、q5Respectively by 5
Query template Cq1、Cq2、Cq3、Cq4、Cq5Generation, Database Systems are IBM DB2, and version number is 9.5.
1st, response data during each load isolated operation is obtained;The response data includes response time, CPU time, patrolled
Collect and read number, BAL values Tq;
Isolated operation loads q1、q2、q3、q4、q5And obtain the respective response time, the CPU time, logic read number, individually
The BAL values of operation.Data are obtained by DB2 snapshots monitor command:“db2 get snapshot for dynamic sql on
database”。
2nd, response data when load is run in pairs is obtained;By q1、q2、q3、q4、q5Carry out permutation and combination, obtain it is all into
To combination(10 operation loads in pairs)The paired operation response time, in pairs operation the CPU time, paired operation logic read
Number, in pairs operation BAL values Tq/c.The acquisition modes of data equally use DB2 snapshot monitor commands.
3rd, average page read times change is calculated
Δ T is calculated by following formulaq1_s0Scope:
Current sample is s0(q1, q2, q3, q4), MPL=4;The other MPL values of one-level lower than current MPL are 3, and it can be generated
And include load q1Sample have s1(q1、q2、q3), s2(q1、q2、q4), s3(q1、q3、q4).
Then:
And:
。
Thus Δ Tq1_s0Calculated value can be given by:
Therefore deduce that Δ Tq1_s0Calculated value, Δ Tq1_s0Represent load q1In sample s0In average page read when
Between change.
The average page read times change of other three class loads similar can also be drawn.
4th, according to average page read times change to this space clustering of bulk sample;
It is all to include q for MPL=41Sample have s0(q1, q2, q3, q4), s4(q1, q2, q4, q5), s5(q1, q3, q4,
q5), s6(q1, q2, q3, q5).
Δ T is calculated respectively for each sampleq1_s0、ΔTq1_s4、ΔTq1_s5、ΔTq1_s6.Then this four values are carried out
Kmeans is clustered.
In actual production environment, due to loadtype up to more than 20, MPL grades are therefore right between 30-200
In each loadtype q, and under given MPL grades, many samples for including q can be obtained.And to Δ Tq_sSet is carried out
Kmeans is clustered, and clusters number is typically chosen to be MPL*2.
5th, filling sample chooses table
The sample s selected to each cluster, its each load included has the numerical value of a sign classification.
For example in s0(q1, q2, q3, q4)In, it is a kind of possible for classification results Ks0(3,1,7,4), represent Δ Tq1_s0
It is the 3rd class, Δ T in full sample spaceq2_s0For the first kind, Δ Tq3_s0For the 7th class, Δ Tq4_s0For the 4th class.
There is corresponding classification results K to each sample ss。
We obtain following form by cluster
According to above classification results, fill following sample and choose table:
Herein, due in example contained loadtype it is few, have some vacancies in sample table.In actual production, there is one
A little positions can be clashed, and cause some positions not fill.Random fashion can be degenerated to again by running into such case, and combination does not have
There is the position of filling.
6th, training sample is generated
Sample chooses table according to obtained by the 5th step, is exactly required model training sample.
Following filling algorithm is provided in the present invention:
Input:Loaded template C, MPL=M;
Output:Selected sample set SampleSeled;
1、SampleSpace = GenerateSampleSpace(M,C);
2nd ,/* generation bulk sample this space S ampleSpace */
3、For Sj∈SampleSpace
/ * calculates the Δ T of each loadtype in each sampleq_s*/
4、ComputeDIF_BAL(Sj);
5、End For
6、For i = 1 to C, Sj∈SampleSpace
/ * is to each loadtype qiWhole Δ Tqi_SjClustered, the number of cluster for M*2*/
7、Kmeans(qi,ΔTqi_Sj,M*2);
8、End For
9、For Sj∈SampleSpace
The Mu values for inserting mutual exclusion number Mu, sample s that/* calculates each sample are defined as:Sample s is inserted at first, for
Other samples of SampleSpace, the total sample number * that can not be further filled with/
10、ComputeMutual(Sj);
11、End For
12、Sort(Muj);
The Mu values of/* according to each sample, from small to large ordered samples space */
13、MaxInsNum = 1;
/ * initialization maximum sample number of fills */
14、For i = 1 to K
/ * K be filling circulation number of times */
15、InsertS(Sj);
/ * inserts sample S at firstj*/
16、InsertNum = 1;
17、For m = j+1 to SampleSpace
18、If(IsInsertS(Sm))
/ * judges SmWhether can insert */
19、InsertS(Sm);
20、InsertNum++;
21、End For
/ * insert successively other can insert sample */
22、If(InsertNum>MaxInsNum)
23、MaxInsNum = InsertNum;
24、RecordInsertS();
If this cyclic pac king of/* is likely larger than existing program, preserve current filling sample */
25、End For
26、RandomInsertS();
The room that/* is not inserted for other, random combine sample */.
Claims (1)
1. the acquisition method of database loads response time forecast model training sample, comprises the steps:
(1)Obtain response data during each load isolated operation of database;
(2)Obtain response data when database loads are run in pairs;
(3)Calculate average page read times change;
Average page read times change is by Δ Tq_s= Tq_s-TqDefinition, Tq_sRepresent the BAL values of load q in sample s, TqRepresent negative
Carry q isolated operation BAL values;
And average page read times change meets following formula:
Wherein Δ Tq/cijRepresent some load q and another load cijIn pairs during operation, some load q BAL values, separately
One load cijIt is sample sjIn by query template CCiThe load of generation;ΔTq/ciRepresent that some load q is loaded with another
ciIn pairs during operation, some load q BAL values, another load ciIt is by query template C in sample sCiThe load of generation;
Utilize the Δ T obtained by paired operationq/cCome calculate the maximum parallel several levels of higher MPL Database Systems not it is lower some load q
Δ Tq_s, Δ T is then given by the following formulaq_sCalculating:
MPL is represented while the number of loads that can be run;
BAL represents that Database Systems often complete a physics and read used average time;
(4)According to average page read times change to this space clustering of bulk sample;
(5)Fill sample and choose table;
(6)Generate training sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510171679.5A CN104794186B (en) | 2015-04-13 | 2015-04-13 | The acquisition method of database loads response time forecast model training sample |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510171679.5A CN104794186B (en) | 2015-04-13 | 2015-04-13 | The acquisition method of database loads response time forecast model training sample |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104794186A CN104794186A (en) | 2015-07-22 |
CN104794186B true CN104794186B (en) | 2017-10-27 |
Family
ID=53558978
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510171679.5A Expired - Fee Related CN104794186B (en) | 2015-04-13 | 2015-04-13 | The acquisition method of database loads response time forecast model training sample |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104794186B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512264B (en) * | 2015-12-04 | 2019-04-19 | 贵州大学 | The performance prediction method that concurrent efforts load in distributed data base |
CN108052614B (en) * | 2017-12-14 | 2021-12-03 | 太原理工大学 | Scheduling method for database system load |
CN113157814B (en) * | 2021-01-29 | 2023-07-18 | 东北大学 | Query-driven intelligent workload analysis method under relational database |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070299965A1 (en) * | 2006-06-22 | 2007-12-27 | Jason Nieh | Management of client perceived page view response time |
CN104113590A (en) * | 2014-06-30 | 2014-10-22 | 南京邮电大学 | Copy selection method based on copy response time prediction |
-
2015
- 2015-04-13 CN CN201510171679.5A patent/CN104794186B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070299965A1 (en) * | 2006-06-22 | 2007-12-27 | Jason Nieh | Management of client perceived page view response time |
CN104113590A (en) * | 2014-06-30 | 2014-10-22 | 南京邮电大学 | Copy selection method based on copy response time prediction |
Non-Patent Citations (2)
Title |
---|
Performance Prediction for Concurrent Database Workloads;Jennie Duggan 等;《SIGMOD"2011》;20111231;第337页-348页 * |
数据库系统交易型负载自适应管理;赵建光 等;《计算机工程与应用》;20131231;第49卷(第6期);第131页-135页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104794186A (en) | 2015-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shao et al. | Online multi-view clustering with incomplete views | |
Acun et al. | Understanding training efficiency of deep learning recommendation models at scale | |
Tsamardinos et al. | A greedy feature selection algorithm for big data of high dimensionality | |
Zhang et al. | Feature selection algorithm based on bare bones particle swarm optimization | |
Zhang et al. | A weighted kernel possibilistic c‐means algorithm based on cloud computing for clustering big data | |
CN107480694B (en) | Weighting selection integration three-branch clustering method adopting two-time evaluation based on Spark platform | |
Luo et al. | A parallel dbscan algorithm based on spark | |
Yang et al. | Versatile multi-stage graph neural network for circuit representation | |
CN104794186B (en) | The acquisition method of database loads response time forecast model training sample | |
Esteves et al. | A new approach for accurate distributed cluster analysis for Big Data: competitive K-Means | |
US11928017B2 (en) | Point anomaly detection | |
Desell et al. | Evolving neural network weights for time-series prediction of general aviation flight data | |
Fan et al. | An evaluation model and benchmark for parallel computing frameworks | |
US10803218B1 (en) | Processor-implemented systems using neural networks for simulating high quantile behaviors in physical systems | |
CN104573331B (en) | A kind of k nearest neighbor data predication method based on MapReduce | |
Mei et al. | Encoding low-rank and sparse structures simultaneously in multi-task learning | |
Jiang et al. | Hierarchical solving method for large scale TSP problems | |
CN117529735A (en) | Method for dividing observable quantity of plurality of qubits, program for dividing observable quantity of plurality of qubits, and information processing apparatus | |
Tutz et al. | Likelihood-based boosting in binary and ordinal random effects models | |
Chen et al. | Double-crossed step-stress accelerated life testing for pneumatic cylinder | |
Petrovsky et al. | Selection of complex system in the reduced multiple criteria space | |
Laccetti et al. | A high performance modified K-means algorithm for dynamic data clustering in multi-core CPUs based environments | |
Koohi-Var et al. | Scientific workflow clustering based on motif discovery | |
Zhang et al. | Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining. | |
Tang et al. | Face Image Recognition Algorithm based on Singular Value Decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20171027 Termination date: 20210413 |
|
CF01 | Termination of patent right due to non-payment of annual fee |