[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104794186B - The acquisition method of database loads response time forecast model training sample - Google Patents

The acquisition method of database loads response time forecast model training sample Download PDF

Info

Publication number
CN104794186B
CN104794186B CN201510171679.5A CN201510171679A CN104794186B CN 104794186 B CN104794186 B CN 104794186B CN 201510171679 A CN201510171679 A CN 201510171679A CN 104794186 B CN104794186 B CN 104794186B
Authority
CN
China
Prior art keywords
load
sample
database
page read
bal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510171679.5A
Other languages
Chinese (zh)
Other versions
CN104794186A (en
Inventor
牛保宁
张锦文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN201510171679.5A priority Critical patent/CN104794186B/en
Publication of CN104794186A publication Critical patent/CN104794186A/en
Application granted granted Critical
Publication of CN104794186B publication Critical patent/CN104794186B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The acquisition method of database loads response time forecast model training sample, belongs to the sample collection method based on cluster, it includes(1)Obtain response data during each load isolated operation of database;(2)Obtain response data when database loads are run in pairs;(3)Calculate average page read times change;(4)According to average page read times change to this space clustering of bulk sample;(5)Fill sample and choose table;(6)Generate training sample.The present invention can reduce the number of samples of statistical model, and keep model accuracy and reduce model setting up cost.

Description

The acquisition method of database loads response time forecast model training sample
Technical field
It is to be applied to database loads response time forecast model the invention belongs to the sample collection method based on cluster Train acquisition method.
Background technology
In current parallel database system, the prediction load response time is extremely important, can help DBA Condition data storehouse parameter, the load of reasonable arrangement schedule parallel.
But due to being influenced each other between data base concurrency load(Interaction)Mechanism is extremely complex, traditional analytic type Model sets up process complexity, and prediction effect is bad.Therefore existing literature, which is mainly, sets up statistical model, to predict the response of load Time.Pass through sample collection, model training(Return), the step of model evaluation three complete statistical model set up.The document of this respect Mainly there are [1] Duggan J, Cetintemel U, Papaemmanouil O, et al. Performance Prediction for Concurrent Database Workloads[C] //Proc.of 2011 ACM SIGMOD Conference(SIGMOD’2011). Athens, Greece, 2011:337-348
[2] Ahmad M, Aboulanaga A,Babu S, et al. Modeling and Exploiting Query Interaction in Database Systems[C] //Proc.of the 17th Conference on Information and Knowledge Management (CIKM’2008).Napa Valley,US,2008:183-192
[3] Ahmad M, AboulanagaA,Babu S, et al. Qshuffler: Getting the Query Mix Right[C] //Proc. of the 24th International Conference on Data Engineering (ICDE’2008).Cancun, Mexico,2008:1415-1417
[4] Ahmad M, Duan S, Aboulanaga A, et al. Predicting Completion Times of Bath Query Workloads Using Interaction-aware Models and Simulation[C] // Proc.of the 14th International Conference on Extending Database Technology (EDBT’2011).Uppsala, Sweden,2011:449-460
[5] Ahmad M, Duan S, Aboulanaga A, et al. Interaction-aware Scheduling of Report Generation Workloads [J].The VLDB Journal,2011,20(4): 589-615
[6] Sheikh M B, Minhas U F, Khan O Z, et al. A Bayesian Approach to Online Performance Modeling for Database Appliances Using Gaussian Models[C] //Proc.of8th International Conference on Autonomic Computing(ICAC’2011).
Karlsruhe, Germany,2011:121-130。
But the corresponding method of sampling of above-mentioned statistical model does not account for influencing each other between load, only by full sample space Specific sampling or random sampling obtain sample.As database data amount increases, load running time increase, if not selected Training sample, can cause the model training time elongated, and the cost that model is set up will become very large.
The content of the invention
Cost is set up in order to reduce model, shortens model setup time, the present invention provides a kind of collection side of training sample Method, can be reduced model and sets up cost while model prediction accuracy is significantly reduced.
Technical scheme:The acquisition method of database loads response time forecast model training sample, including under State content:
1st, response data during each load isolated operation of database is obtained;
When i.e. each loads q isolated operations, its response time, CPU time, logic reading number, BAL values are obtained.Wherein BAL is the Buffer Access Latency values defined in [1], represents that Database Systems often complete a physics and read institute The average time used, this invention simply if referred to as read average time.Buffer Access Latency values derive from document Duggan J, Cetintemel U, Papaemmanouil O, et al. Performance Prediction for Concurrent Database Workloads //Proc.of 2011 ACM SIGMOD Conference(SIGMOD’ 2011). Athens, Greece, 2011:337-348
Load q is represented by loaded template CqThe executable database loads of generation.
Loaded template is generated by the data base querying with parameter, renewal sentence;Different inquiry, update sentence and be considered as Different loaded templates.The different load of the parameter of same loaded template generation, is considered as same load.
2nd, response data when database loads are run in pairs is obtained;I.e. first load qiWith the second load qjOperation in pairs When, obtain respective response time, CPU time, logic reading number, BAL values;Wherein first load qiWith the second load qjRespectively Belong to two different loads templates(First loaded template CqiWith the second loaded template Cqj)Generation.
3rd, average page read times change is calculated;
Average page read times change is by Δ Tq_s= Tq_s-TqDefinition, Tq_sRepresent the load of some in sample s q(By bearing Back(ing) board CqGeneration)BAL values, TqRepresent the BAL values of some load q isolated operation.
Average page read times change meets following formula simultaneously:
Wherein Δ Tq/cijRepresent some load q and another load cijIn pairs during operation, some load q BAL Value, another load cijIt is sample sjIn by query template CCiThe load of generation;ΔTq/ciRepresent some load q with it is another Individual load ciIn pairs during operation, some load q BAL values, another load ciIt is by query template C in sample sCiGeneration Load;
Utilize the Δ T obtained by paired operationq/cTo calculate higher MPL(Multi Programming Level, data base set System is maximum and line number, i.e. expression are while the number of loads that can be run)Some load q Δ T under rankq_s.Then under Formula provides Δ Tq_sCalculating:
4th, according to average page read times change to this space clustering of bulk sample;
For each class some load q, in given MPL ranks(Multi Programming Level)Under, to it All Tq_sClustered, clustering method selects Kmeans algorithms, measures as Euclidean distance.Clusters number is MPL*2.
5th, filling sample chooses table;
6th, training sample is generated.
The present invention can reduce the number of samples of statistical model, and keep model accuracy and reduce model being created as This.
Embodiment
Embodiment:If it is q respectively to give 5 loadtypes1、q2、q3、q4、q5;MPL grades are 4, and representing simultaneously can be in number It it is 4 according to the load number run in storehouse, current sample is s0(q1, q2, q3, q4).Wherein q1、q2、q3、q4、q5Respectively by 5 Query template Cq1、Cq2、Cq3、Cq4、Cq5Generation, Database Systems are IBM DB2, and version number is 9.5.
1st, response data during each load isolated operation is obtained;The response data includes response time, CPU time, patrolled Collect and read number, BAL values Tq
Isolated operation loads q1、q2、q3、q4、q5And obtain the respective response time, the CPU time, logic read number, individually The BAL values of operation.Data are obtained by DB2 snapshots monitor command:“db2 get snapshot for dynamic sql on database”。
2nd, response data when load is run in pairs is obtained;By q1、q2、q3、q4、q5Carry out permutation and combination, obtain it is all into To combination(10 operation loads in pairs)The paired operation response time, in pairs operation the CPU time, paired operation logic read Number, in pairs operation BAL values Tq/c.The acquisition modes of data equally use DB2 snapshot monitor commands.
3rd, average page read times change is calculated
Δ T is calculated by following formulaq1_s0Scope:
Current sample is s0(q1, q2, q3, q4), MPL=4;The other MPL values of one-level lower than current MPL are 3, and it can be generated And include load q1Sample have s1(q1、q2、q3), s2(q1、q2、q4), s3(q1、q3、q4).
Then:
And:
Thus Δ Tq1_s0Calculated value can be given by:
Therefore deduce that Δ Tq1_s0Calculated value, Δ Tq1_s0Represent load q1In sample s0In average page read when Between change.
The average page read times change of other three class loads similar can also be drawn.
4th, according to average page read times change to this space clustering of bulk sample;
It is all to include q for MPL=41Sample have s0(q1, q2, q3, q4), s4(q1, q2, q4, q5), s5(q1, q3, q4, q5), s6(q1, q2, q3, q5).
Δ T is calculated respectively for each sampleq1_s0、ΔTq1_s4、ΔTq1_s5、ΔTq1_s6.Then this four values are carried out Kmeans is clustered.
In actual production environment, due to loadtype up to more than 20, MPL grades are therefore right between 30-200 In each loadtype q, and under given MPL grades, many samples for including q can be obtained.And to Δ Tq_sSet is carried out Kmeans is clustered, and clusters number is typically chosen to be MPL*2.
5th, filling sample chooses table
The sample s selected to each cluster, its each load included has the numerical value of a sign classification.
For example in s0(q1, q2, q3, q4)In, it is a kind of possible for classification results Ks0(3,1,7,4), represent Δ Tq1_s0 It is the 3rd class, Δ T in full sample spaceq2_s0For the first kind, Δ Tq3_s0For the 7th class, Δ Tq4_s0For the 4th class.
There is corresponding classification results K to each sample ss
We obtain following form by cluster
According to above classification results, fill following sample and choose table:
Herein, due in example contained loadtype it is few, have some vacancies in sample table.In actual production, there is one A little positions can be clashed, and cause some positions not fill.Random fashion can be degenerated to again by running into such case, and combination does not have There is the position of filling.
6th, training sample is generated
Sample chooses table according to obtained by the 5th step, is exactly required model training sample.
Following filling algorithm is provided in the present invention:
Input:Loaded template C, MPL=M;
Output:Selected sample set SampleSeled;
1、SampleSpace = GenerateSampleSpace(M,C);
2nd ,/* generation bulk sample this space S ampleSpace */
3、For Sj∈SampleSpace
/ * calculates the Δ T of each loadtype in each sampleq_s*/
4、ComputeDIF_BAL(Sj);
5、End For
6、For i = 1 to C, Sj∈SampleSpace
/ * is to each loadtype qiWhole Δ Tqi_SjClustered, the number of cluster for M*2*/
7、Kmeans(qi,ΔTqi_Sj,M*2);
8、End For
9、For Sj∈SampleSpace
The Mu values for inserting mutual exclusion number Mu, sample s that/* calculates each sample are defined as:Sample s is inserted at first, for Other samples of SampleSpace, the total sample number * that can not be further filled with/
10、ComputeMutual(Sj);
11、End For
12、Sort(Muj);
The Mu values of/* according to each sample, from small to large ordered samples space */
13、MaxInsNum = 1;
/ * initialization maximum sample number of fills */
14、For i = 1 to K
/ * K be filling circulation number of times */
15、InsertS(Sj);
/ * inserts sample S at firstj*/
16、InsertNum = 1;
17、For m = j+1 to SampleSpace
18、If(IsInsertS(Sm))
/ * judges SmWhether can insert */
19、InsertS(Sm);
20、InsertNum++;
21、End For
/ * insert successively other can insert sample */
22、If(InsertNum>MaxInsNum)
23、MaxInsNum = InsertNum;
24、RecordInsertS();
If this cyclic pac king of/* is likely larger than existing program, preserve current filling sample */
25、End For
26、RandomInsertS();
The room that/* is not inserted for other, random combine sample */.

Claims (1)

1. the acquisition method of database loads response time forecast model training sample, comprises the steps:
(1)Obtain response data during each load isolated operation of database;
(2)Obtain response data when database loads are run in pairs;
(3)Calculate average page read times change;
Average page read times change is by Δ Tq_s= Tq_s-TqDefinition, Tq_sRepresent the BAL values of load q in sample s, TqRepresent negative Carry q isolated operation BAL values;
And average page read times change meets following formula:
Wherein Δ Tq/cijRepresent some load q and another load cijIn pairs during operation, some load q BAL values, separately One load cijIt is sample sjIn by query template CCiThe load of generation;ΔTq/ciRepresent that some load q is loaded with another ciIn pairs during operation, some load q BAL values, another load ciIt is by query template C in sample sCiThe load of generation;
Utilize the Δ T obtained by paired operationq/cCome calculate the maximum parallel several levels of higher MPL Database Systems not it is lower some load q Δ Tq_s, Δ T is then given by the following formulaq_sCalculating:
MPL is represented while the number of loads that can be run;
BAL represents that Database Systems often complete a physics and read used average time;
(4)According to average page read times change to this space clustering of bulk sample;
(5)Fill sample and choose table;
(6)Generate training sample.
CN201510171679.5A 2015-04-13 2015-04-13 The acquisition method of database loads response time forecast model training sample Expired - Fee Related CN104794186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510171679.5A CN104794186B (en) 2015-04-13 2015-04-13 The acquisition method of database loads response time forecast model training sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510171679.5A CN104794186B (en) 2015-04-13 2015-04-13 The acquisition method of database loads response time forecast model training sample

Publications (2)

Publication Number Publication Date
CN104794186A CN104794186A (en) 2015-07-22
CN104794186B true CN104794186B (en) 2017-10-27

Family

ID=53558978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510171679.5A Expired - Fee Related CN104794186B (en) 2015-04-13 2015-04-13 The acquisition method of database loads response time forecast model training sample

Country Status (1)

Country Link
CN (1) CN104794186B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512264B (en) * 2015-12-04 2019-04-19 贵州大学 The performance prediction method that concurrent efforts load in distributed data base
CN108052614B (en) * 2017-12-14 2021-12-03 太原理工大学 Scheduling method for database system load
CN113157814B (en) * 2021-01-29 2023-07-18 东北大学 Query-driven intelligent workload analysis method under relational database

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299965A1 (en) * 2006-06-22 2007-12-27 Jason Nieh Management of client perceived page view response time
CN104113590A (en) * 2014-06-30 2014-10-22 南京邮电大学 Copy selection method based on copy response time prediction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299965A1 (en) * 2006-06-22 2007-12-27 Jason Nieh Management of client perceived page view response time
CN104113590A (en) * 2014-06-30 2014-10-22 南京邮电大学 Copy selection method based on copy response time prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Performance Prediction for Concurrent Database Workloads;Jennie Duggan 等;《SIGMOD"2011》;20111231;第337页-348页 *
数据库系统交易型负载自适应管理;赵建光 等;《计算机工程与应用》;20131231;第49卷(第6期);第131页-135页 *

Also Published As

Publication number Publication date
CN104794186A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
Shao et al. Online multi-view clustering with incomplete views
Acun et al. Understanding training efficiency of deep learning recommendation models at scale
Tsamardinos et al. A greedy feature selection algorithm for big data of high dimensionality
Zhang et al. Feature selection algorithm based on bare bones particle swarm optimization
Zhang et al. A weighted kernel possibilistic c‐means algorithm based on cloud computing for clustering big data
CN107480694B (en) Weighting selection integration three-branch clustering method adopting two-time evaluation based on Spark platform
Luo et al. A parallel dbscan algorithm based on spark
Yang et al. Versatile multi-stage graph neural network for circuit representation
CN104794186B (en) The acquisition method of database loads response time forecast model training sample
Esteves et al. A new approach for accurate distributed cluster analysis for Big Data: competitive K-Means
US11928017B2 (en) Point anomaly detection
Desell et al. Evolving neural network weights for time-series prediction of general aviation flight data
Fan et al. An evaluation model and benchmark for parallel computing frameworks
US10803218B1 (en) Processor-implemented systems using neural networks for simulating high quantile behaviors in physical systems
CN104573331B (en) A kind of k nearest neighbor data predication method based on MapReduce
Mei et al. Encoding low-rank and sparse structures simultaneously in multi-task learning
Jiang et al. Hierarchical solving method for large scale TSP problems
CN117529735A (en) Method for dividing observable quantity of plurality of qubits, program for dividing observable quantity of plurality of qubits, and information processing apparatus
Tutz et al. Likelihood-based boosting in binary and ordinal random effects models
Chen et al. Double-crossed step-stress accelerated life testing for pneumatic cylinder
Petrovsky et al. Selection of complex system in the reduced multiple criteria space
Laccetti et al. A high performance modified K-means algorithm for dynamic data clustering in multi-core CPUs based environments
Koohi-Var et al. Scientific workflow clustering based on motif discovery
Zhang et al. Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining.
Tang et al. Face Image Recognition Algorithm based on Singular Value Decomposition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171027

Termination date: 20210413

CF01 Termination of patent right due to non-payment of annual fee