[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN105630458B - The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network - Google Patents

The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network Download PDF

Info

Publication number
CN105630458B
CN105630458B CN201511019177.7A CN201511019177A CN105630458B CN 105630458 B CN105630458 B CN 105630458B CN 201511019177 A CN201511019177 A CN 201511019177A CN 105630458 B CN105630458 B CN 105630458B
Authority
CN
China
Prior art keywords
neural network
average throughput
instruction
hidden layer
throughput rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201511019177.7A
Other languages
Chinese (zh)
Other versions
CN105630458A (en
Inventor
张阳
王伟
蒋网扣
王芹
赵煜健
凌明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SOUTHEAST UNIVERSITY SUZHOU INSTITUTE
Original Assignee
Southeast University - Wuxi Institute Of Technology Integrated Circuits
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University - Wuxi Institute Of Technology Integrated Circuits filed Critical Southeast University - Wuxi Institute Of Technology Integrated Circuits
Priority to CN201511019177.7A priority Critical patent/CN105630458B/en
Publication of CN105630458A publication Critical patent/CN105630458A/en
Application granted granted Critical
Publication of CN105630458B publication Critical patent/CN105630458B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,借助指令集模拟器的全仿真环境获取目标程序执行阶段的微架构无关参数,再利用SOM和Kmeans算法提取出输入数据中的特征点,最后通过BP神经网络拟合微架构无关参数与稳态平均吞吐率的关系,训练出精度较高的模型。模型训练完成后,通过模拟器获得程序的微架构无关信息,导入到训练好的神经网络中,即可快速准确地预测实际稳态平均吞吐率值。本发明采用人工神经网络,极大地提高了乱序处理器稳态下平均吞吐率的预测精度和速度。

The invention discloses a method for predicting the average throughput rate of an out-of-order processor based on an artificial neural network in a steady state. The micro-architecture-independent parameters of the target program execution stage are obtained by means of the full simulation environment of the instruction set simulator, and then SOM and Kmeans are used The algorithm extracts the feature points in the input data, and finally uses the BP neural network to fit the relationship between the micro-architecture-independent parameters and the steady-state average throughput rate, and trains a model with high accuracy. After the model training is completed, the microarchitecture-independent information of the program is obtained through the simulator and imported into the trained neural network to quickly and accurately predict the actual steady-state average throughput value. The invention adopts the artificial neural network, which greatly improves the prediction accuracy and speed of the average throughput rate of the out-of-sequence processor in a steady state.

Description

一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预 测方法A Prediction of Average Throughput Rate of Out-of-Order Processor in Steady State Based on Artificial Neural Network test method

技术领域technical field

本发明涉及一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,属于软硬件协同设计技术。The invention relates to a method for predicting the average throughput rate of an out-of-order processor in a steady state based on an artificial neural network, which belongs to software-hardware collaborative design technology.

背景技术Background technique

基于硬件行为建模的硅前架构评估及设计空间探索可提供芯片设计指导意见,降低芯片设计迭代周期。在特定处理器和指定程序运行的情况下,乱序处理器稳态下的平均吞吐率表征了没有缺失事件发生的情况下,处理器性能的极限,同时也在一定程度上反映了应用程序的设计与硬件是否适配。同时,乱序处理器稳态下的平均吞吐率有助于后续的乱序处理器整体性能的分析建模。Pre-silicon architecture evaluation and design space exploration based on hardware behavior modeling can provide chip design guidance and reduce chip design iteration cycles. In the case of a specific processor and a specified program running, the average throughput rate of the out-of-order processor under steady state characterizes the limit of processor performance when no missing events occur, and also reflects the performance of the application to a certain extent. Whether the design and hardware are compatible. At the same time, the average throughput rate of the out-of-order processor in a steady state is helpful for subsequent analysis and modeling of the overall performance of the out-of-order processor.

目前对乱序处理器稳态下平均吞吐率的认识经历过两个阶段,第一个阶段是直接将前端指令发射级的宽度作为乱序处理器稳态下的平均吞吐率,该方法假设:当乱序处理器没有缺失事件发生时,处理器每个时钟内能够处理与前端指令发射级宽度等量的指令。这种方法忽略了对指令依赖、功能单元数量和种类、指令延迟、串行指令分布等因素的考虑,是一种很粗粒度的假设;第二个阶段认为乱序处理器稳态下平均吞吐率与前端指令发射级宽度、关键路径长度、功能单元数目和种类相关,并认为平均吞吐率只受限于影响最大的一个因素。该方法相比于第一种方法,考虑了更多影响平均吞吐率的因素,但局限于单一的影响因素,没有能够考虑到各个要素之间的耦合关系。The current understanding of the average throughput rate of an out-of-order processor in a steady state has gone through two stages. The first stage is to directly use the width of the front-end instruction issue stage as the average throughput rate of an out-of-order processor in a steady state. This method assumes that: When the out-of-order processor has no miss events, the processor can process instructions equal to the width of the front-end instruction issue stage per clock. This method ignores factors such as instruction dependence, number and type of functional units, instruction delay, and serial instruction distribution. It is a very coarse-grained assumption; the second stage considers that the average throughput of the out-of-order processor in steady state The rate is related to the width of the front-end instruction issue stage, the length of the critical path, the number and type of functional units, and it is considered that the average throughput rate is only limited by the most influential factor. Compared with the first method, this method considers more factors that affect the average throughput rate, but is limited to a single influencing factor, and does not take into account the coupling relationship between various elements.

乱序处理器稳态下的平均吞吐率是指在没有缺失事件发生的情况下,平均每个时钟发射的指令数目。在指令并行度高且处理器后端功能单元充足的条件下,稳态下平均吞吐率等于前端指令发射级的宽度D,该参数也是理想状态下的平均吞吐率。但当指令间存在较强的依赖关系时,比如,后一条指令的执行所需要的数据由前一条指令的执行结果提供,则平均每个时钟能够发射的指令数目减少,且随着依赖链越长,越多,稳态下平均吞吐率就会越低。当处理器后端功能单元数目及种类不充足时,即使指令流本身有较高的并行度,硬件单元受限于数目、种类以及执行延迟的影响,也无法保证最高的平均吞吐率D。另外值得注意的是,安卓系统中引入的串行指令DSB、DMB、ISB,也限制了稳态下平均吞吐率,串行指令要求在该指令之前的指令或者数据访问必须全部完成,才能继续执行后续的指令,那么即使在指令流本身并行度高且处理器后端功能单元充足的条件下,串行指令的分布也很大程度上影响了稳态下平均吞吐率。The steady-state average throughput of an out-of-order processor is the average number of instructions issued per clock when no misses occur. Under the conditions of high instruction parallelism and sufficient back-end functional units of the processor, the average throughput rate in steady state is equal to the width D of the front-end instruction issuing stage, which is also the average throughput rate in ideal state. However, when there is a strong dependency between instructions, for example, the data required for the execution of the next instruction is provided by the execution result of the previous instruction, the average number of instructions that can be issued per clock decreases, and as the dependency chain increases The longer, the more, the lower the average throughput rate in steady state. When the number and type of functional units at the back end of the processor are insufficient, even if the instruction stream itself has a high degree of parallelism, the hardware unit is limited by the number, type, and execution delay, and cannot guarantee the highest average throughput D. It is also worth noting that the serial instructions DSB, DMB, and ISB introduced in the Android system also limit the average throughput rate in a steady state. The serial instructions require that all instructions or data accesses before the instruction must be completed before they can continue to execute. Subsequent instructions, even under the condition that the parallelism of the instruction stream itself is high and the processor back-end functional units are sufficient, the distribution of serial instructions greatly affects the average throughput rate in steady state.

最后值得注意的是,稳态下平均吞吐率的大小与各个影响因素之间并不是简单的单一作用关系,即各个因素之间的耦合效应也在影响着稳态下平均吞吐率的大小,这无疑加大了机理角度分析的难度。同时由于全仿真时间开销过大,所以本发明针对上述问题提出了一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,用于快速准确地预测稳态下平均吞吐率Finally, it is worth noting that the relationship between the average throughput rate in the steady state and each influencing factor is not a simple single action relationship, that is, the coupling effect between various factors also affects the average throughput rate in the steady state. Undoubtedly increased the difficulty of analysis from the perspective of mechanism. Simultaneously because the full simulation time overhead is too large, so the present invention proposes a method for predicting the average throughput rate of an out-of-order processor based on an artificial neural network in a steady state for the above-mentioned problems, for quickly and accurately predicting the average throughput rate in a steady state

发明内容Contents of the invention

发明目的:为了克服现有技术中存在的不足,本发明提供一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,能够根据微架构无关参数快速准确地预测乱序处理器稳态下平均吞吐率,且预测方法精度高、速度快。Purpose of the invention: In order to overcome the deficiencies in the prior art, the present invention provides a method for predicting the average throughput of an out-of-order processor based on an artificial neural network in a steady state, which can quickly and accurately predict out-of-order processing based on microarchitecture-independent parameters The average throughput rate in the steady state of the device, and the prediction method has high precision and fast speed.

技术方案:为实现上述目的,本发明采用的技术方案为:Technical scheme: in order to achieve the above object, the technical scheme adopted in the present invention is:

一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,包括如下步骤:A method for predicting the average throughput rate of an out-of-order processor based on an artificial neural network in a steady state, comprising the steps of:

(1)将指令集模拟器仿真时线程号切换的时间点作为片段分割的间断点,从而将整个目标程序划分为若干片段,对所有片段按线程号进行划分并排序,统计每个片段包含的时钟数,删除时钟数小于阈值(比如1000)的片段;(1) The time point when the thread number is switched during the simulation of the instruction set simulator is used as the discontinuity point of the fragment segmentation, so that the entire target program is divided into several fragments, and all fragments are divided and sorted according to the thread number, and the statistics contained in each fragment are counted. Number of clocks, delete fragments whose number of clocks is less than the threshold (such as 1000);

(2)对目标线程中保留下来的片段,利用指令集模拟器获取每个片段的相关微架构无关参数,所述微架构无关参数包括动态指令流混合比(浮点、定点、SIMD、Load/Store指令的数目等)、关键路径长度(针对处理器后端设计的不同,统计相应的关键路径长度分布,本专利统计关键路径长度为1到40的数目分布)、串行化指令、前端发射指令速率(针对处理器前端设计的不同,统计相应的发射指令数目分布,本专利统计发射指令数目为0到4的数目分布)和目标线程的运行总时间;(2) For the fragments retained in the target thread, use the instruction set simulator to obtain the relevant microarchitecture-independent parameters of each fragment, and the microarchitecture-independent parameters include dynamic instruction stream mixing ratio (floating point, fixed point, SIMD, Load/ The number of Store instructions, etc.), critical path length (according to the difference in processor back-end design, count the corresponding critical path length distribution, this patent counts the number distribution of critical path lengths from 1 to 40), serialized instructions, front-end launch Instruction rate (according to the difference in front-end design of the processor, count the distribution of the corresponding number of issued instructions, this patent counts the distribution of the number of issued instructions from 0 to 4) and the total running time of the target thread;

(3)首先,考虑BP神经网络对输入数据的要求(动态指令流混合比、关键路径长度分布、串行化指令),对每个片段的相关微架构无关参数进行预处理,形成对应片段的相关微架构无关参数向量;然后,通过主成分分析(PCA)对每个相关微架构无关参数向量进行降维、去噪处理,形成对应片段的MicaData数据集(微结构无关数据集)。(3) First, considering the requirements of the BP neural network on the input data (dynamic instruction flow mixing ratio, critical path length distribution, serialized instructions), preprocess the relevant micro-architecture-independent parameters of each segment to form the corresponding segment Correlated microarchitecture-independent parameter vectors; then, through principal component analysis (PCA), each relevant microarchitecture-independent parameter vector is subjected to dimensionality reduction and denoising processing to form a MicaData dataset (microarchitecture-independent dataset) of the corresponding segment.

(4)对目标线程中保留下来的片段,首先,通过SOM(SelfOrganizingFeatureMaps,自组织映射网络)将所有MicaData数据集分成N个大类(比如200个大类);然后,通过k-均值聚类(Kmeans聚类)算法将第n个大类分成Mn个小类(一般每个小类的数目是大类里面片断数目的15%),1≤n≤N;选取每个小类中离中心点距离最近的点作为该小类的特征点;步骤(3)和步骤(4)的处理,使得在保留原始数据主要信息的前提下减少了BP神经网络模型训练的输入数据并降低了BP神经网络模型训练所需的时间;(4) For the fragments retained in the target thread, first, divide all MicaData datasets into N categories (such as 200 categories) through SOM (SelfOrganizingFeatureMaps, self-organizing map network); then, through k-means clustering The (Kmeans clustering) algorithm divides the nth major category into M n subcategories (generally, the number of each subcategory is 15% of the number of fragments in the major category), 1≤n≤N; The point closest to the central point is used as the feature point of the subclass; the processing of step (3) and step (4) reduces the input data of BP neural network model training and reduces the BP while retaining the main information of the original data. The time required for neural network model training;

(5)对目标线程中保留下来的片段,将所有特征点作为BP神经网络的输入,BP神经网络的输出为目标线程的稳态平均吞吐率,对BP神经网络的输入和输出进行拟合,通过调节BP神经网络的迭代次数和训练精度,训练得到目标线程的BP神经网络模型;(5) For the fragments retained in the target thread, all feature points are used as the input of the BP neural network, and the output of the BP neural network is the steady-state average throughput rate of the target thread, and the input and output of the BP neural network are fitted, By adjusting the number of iterations and training accuracy of the BP neural network, train the BP neural network model of the target thread;

(6)BP神经网络模型训练完成后,通过指令集模拟器获得其他待预测线程的微架构无关参数信息,导入到训练好的BP神经网络模型中,即可快速准确地预测实际稳态平均吞吐率值;其他待预测线程包括目标程序中的线程,或者其他应用程序中的线程。(6) After the training of the BP neural network model is completed, obtain the microarchitecture-independent parameter information of other threads to be predicted through the instruction set simulator, and import it into the trained BP neural network model to quickly and accurately predict the actual steady-state average throughput rate value; other threads to be predicted include threads in the target program, or threads in other applications.

具体的,所述步骤(5)中,BP神经网络有三个隐含层,第一隐含层采用30个神经元,第二隐含层采用15个神经元,第三隐含层采用15个神经元;输入层和第一隐含层之间、第一隐含层和第二隐含层之间采用logsig传递函数,第二隐含层和第三隐含层之间、第三隐含层和输出层之间采用purelin传递函数,各层结点之间的权重值均使用trainscg(量化共轭梯度法)进行调节,训练方法采用LM(LevenbergMarquard)算法。Specifically, in the step (5), the BP neural network has three hidden layers, the first hidden layer adopts 30 neurons, the second hidden layer adopts 15 neurons, and the third hidden layer adopts 15 neurons Neuron; the logsig transfer function is used between the input layer and the first hidden layer, between the first hidden layer and the second hidden layer, between the second hidden layer and the third hidden layer, and the third hidden layer The purelin transfer function is used between the layer and the output layer, the weight values between the nodes of each layer are adjusted by trainscg (quantized conjugate gradient method), and the training method uses the LM (LevenbergMarquard) algorithm.

有益效果:与现有的稳态平均吞吐率的预测方法相比,本发明提供的基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,覆盖了影响稳态平均吞吐率的多个微架构无关参数,其中包括:动态指令混合比,关键路径长度分布,串行指令分布;此外,本发明采用神经网络对稳态平均吞吐率进行预测,能够充分地考虑到微架构无关参数之间的耦合性,并且通过训练好的模型能够准确快速地预测到稳态平均吞吐率的值。Beneficial effects: Compared with the existing prediction method of steady-state average throughput rate, the prediction method of the average throughput rate of the out-of-order processor based on artificial neural network provided by the present invention covers the factors that affect the steady-state average throughput rate. A number of micro-architecture-independent parameters, including: dynamic instruction mixing ratio, critical path length distribution, and serial instruction distribution; in addition, the present invention uses a neural network to predict the steady-state average throughput, which can fully take into account the micro-architecture-independent parameters The coupling between them, and the trained model can accurately and quickly predict the value of the steady-state average throughput.

附图说明Description of drawings

图1为采用本发明训练Ann模型的具体流程图;Fig. 1 is the concrete flow chart that adopts the training Ann model of the present invention;

图2为神经网络模型训练、测试的输入与目标输出框图;Fig. 2 is the input and target output block diagram of neural network model training, test;

图3为神经网络层级图。Figure 3 is a hierarchical diagram of the neural network.

具体实施方式Detailed ways

下面结合附图对本发明作更进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings.

一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,包括如下步骤:A method for predicting the average throughput rate of an out-of-order processor based on an artificial neural network in a steady state, comprising the steps of:

(1)将指令集模拟器仿真时线程号切换的时间点作为片段分割的间断点,从而将整个目标程序划分为若干片段,对所有片段按线程号进行划分并排序,统计每个片段包含的时钟数,删除时钟数小于1000的片段;(1) The time point when the thread number is switched during the simulation of the instruction set simulator is used as the discontinuity point of the fragment segmentation, so that the entire target program is divided into several fragments, and all fragments are divided and sorted according to the thread number, and the statistics contained in each fragment are counted. Number of clocks, delete fragments with a clock number less than 1000;

(2)对目标线程中保留下来的片段,利用指令集模拟器获取每个片段与稳态平均吞吐率相关的微架构无关参数,所述微架构无关参数包括动态指令流混合比、关键路径长度、串行化指令、前端指令发射速率和目标程序的运行总时间;通过定义指令类型的结构体,记录每条指令的类型,统计各类型指令的分布情况,从而得到动态指令流混合比;通过定义结构体,统计固定指令窗口大小下,存在依赖相关指令数目最大值的分布情况,从而得到关键路径长度分布情况;通过定义检索指令类型,并统计ISB、DSB、DMB指令的数目,可以得到串行化指令数目;与此同时,监测处理器前端指令发射级并统计一段时间或者一个指令流片段内发射出的指令数目和花费的时钟数,来计算出前端指令发射速率;(2) For the fragments retained in the target thread, use the instruction set simulator to obtain the micro-architecture-independent parameters related to the steady-state average throughput of each fragment, and the micro-architecture-independent parameters include dynamic instruction flow mixing ratio, critical path length , serialized instructions, front-end instruction emission rate and the total running time of the target program; by defining the structure of the instruction type, recording the type of each instruction, and counting the distribution of each type of instruction, so as to obtain the dynamic instruction flow mixing ratio; through Define the structure and count the distribution of the maximum number of related instructions depending on the size of the fixed instruction window, so as to obtain the distribution of the key path length; by defining the type of retrieval instruction and counting the number of ISB, DSB, and DMB instructions, the string can be obtained At the same time, monitor the front-end instruction issuance level of the processor and count the number of instructions issued and the number of clocks spent in a period of time or within an instruction stream segment to calculate the front-end instruction issuance rate;

(3)首先,考虑BP神经网络对输入数据的要求,对每个片段的相关微架构无关参数进行预处理(尤其是动态指令流混合比相关参数),形成对应片段的相关微架构无关参数向量;然后,通过主成分分析(选取包含95%以上原始数据的主元成分,降低原始数据量)对每个相关微架构无关参数向量进行降维、去噪处理,形成对应片段的MicaData数据集;(3) First, consider the requirements of the BP neural network for input data, preprocess the relevant micro-architecture-independent parameters of each segment (especially the parameters related to the dynamic instruction flow mixing ratio), and form the relevant micro-architecture-independent parameter vector of the corresponding segment Then, carry out dimensionality reduction and denoising processing to each relevant micro-architecture irrelevant parameter vector by principal component analysis (choose the principal component that contains more than 95% of the original data, reduce the amount of original data), and form the MicaData data set of the corresponding segment;

(4)对目标线程中保留下来的片段,首先,通过SOM将所有MicaData数据集分成N个大类;然后,通过k-均值聚类算法将第n个大类分成Mn个小类,1≤n≤N;选取每个小类的中心点作为该小类的特征点;(4) For the fragments retained in the target thread, firstly, all MicaData datasets are divided into N categories by SOM; then, the nth category is divided into M n subcategories by k-means clustering algorithm, 1 ≤n≤N; select the center point of each subclass as the feature point of the subclass;

(5)对目标线程中保留下来的片段,将所有特征点作为BP神经网络的输入,BP神经网络的输出为目标线程的稳态平均吞吐率,对BP神经网络的输入和输出进行拟合,通过调节BP神经网络的迭代次数和训练精度,训练得到目标线程的BP神经网络模型;(5) For the fragments retained in the target thread, all feature points are used as the input of the BP neural network, and the output of the BP neural network is the steady-state average throughput rate of the target thread, and the input and output of the BP neural network are fitted, By adjusting the number of iterations and training accuracy of the BP neural network, train the BP neural network model of the target thread;

(6)利用指令集模拟器运行目标程序并加入软件插装桩,统计动态指令流混合比、关键路径长度以及串行指令分布,对得到的数据进行处理后得到相关线程的所有特征点,并导入到目标线程的BP神经网络模型,即可快速、准确地预测到目标线程在乱序处理器稳态下的平均吞吐率。(6) Use the instruction set simulator to run the target program and add software instrumentation piles, count the dynamic instruction flow mixing ratio, critical path length and serial instruction distribution, process the obtained data to obtain all the feature points of the relevant threads, and Imported to the BP neural network model of the target thread, it can quickly and accurately predict the average throughput rate of the target thread in the steady state of the out-of-order processor.

图1为训练Ann模型的具体流程图。从指令集模拟器中取出数据后,按照线程号进行归类,然后对数据进行预处理,再经过PCA进行降维,最后通过SOM,Kmeans算法选取出最具有代表性的特征点作为模型的输入,训练出精度较高的模型。Figure 1 is a specific flow chart of training the Ann model. After taking out the data from the instruction set simulator, classify according to the thread number, then preprocess the data, and then perform dimension reduction through PCA, and finally select the most representative feature points as the input of the model through SOM and Kmeans algorithm , to train a model with higher accuracy.

图2为神经网络模型训练、测试的输入与目标输出框图。通过指令集模拟器仿真,我们可以得到模型的参数输入和目标输出,从而训练出精度较高的模型;在进行预测的时候,只需要通过模拟器得出目标应用程序的相关参数,然后将这些参数导入模型,就可以快速地预测到稳态平均吞吐率值;图中实线部分的为训练过程的流程,虚线部分为预测过程的流程。Fig. 2 is a block diagram of neural network model training, testing input and target output. Through the simulation of the instruction set simulator, we can obtain the parameter input and target output of the model, so as to train a model with high accuracy; when making predictions, we only need to obtain the relevant parameters of the target application program through the simulator, and then use these By importing parameters into the model, you can quickly predict the steady-state average throughput value; the solid line in the figure is the process of the training process, and the dotted line is the process of the prediction process.

图3为神经网络层级图。本发明依照隐含层节点数目经验公式:Figure 3 is a hierarchical diagram of the neural network. The present invention is based on the empirical formula for the number of hidden layer nodes:

其中:h表示隐含层节点数,m表示输出层节点数,n表示输入层节点数,a表示一个常数(1≤a≤10)。本案采用三个隐含层,第一隐含层采用30个神经单元,第二隐含层采用15个神经元,第三隐含层采用15个神经元;训练方法采用LM(LevenbergMarquard)算法。Among them: h represents the number of nodes in the hidden layer, m represents the number of nodes in the output layer, n represents the number of nodes in the input layer, and a represents a constant (1≤a≤10). This case uses three hidden layers, the first hidden layer uses 30 neurons, the second hidden layer uses 15 neurons, and the third hidden layer uses 15 neurons; the training method uses LM (LevenbergMarquard) algorithm.

以上所述仅是本发明的优选实施方式,应当指出:对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications are also possible. It should be regarded as the protection scope of the present invention.

Claims (4)

1.一种基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,其特征在于:包括如下步骤:1. A prediction method of average throughput rate under the steady state of an out-of-order processor based on artificial neural network, is characterized in that: comprise the steps: (1)将指令集模拟器仿真时线程号切换的时间点作为片段分割的间断点,从而将整个目标程序划分为若干片段,对所有片段按线程号进行划分并排序,统计每个片段包含的时钟数,删除时钟数小于阈值的片段;(1) The time point when the thread number is switched during the simulation of the instruction set simulator is used as the discontinuity point of the fragment segmentation, so that the entire target program is divided into several fragments, and all fragments are divided and sorted according to the thread number, and the statistics contained in each fragment are counted. Number of clocks, delete fragments whose number of clocks is less than the threshold; (2)对目标线程中保留下来的片段,利用指令集模拟器获取每个片段的相关微架构无关参数,微架构无关参数包括动态指令流混合比、关键路径长度、串行化指令、前端指令发射速率和目标线程的运行总时间;通过定义指令类型的结构体,记录每条指令的类型,统计各类型指令的分布情况,从而得到动态指令流混合比;通过定义结构体,统计固定指令窗口大小下,存在依赖相关指令数目最大值的分布情况,从而得到关键路径长度分布情况;通过定义检索指令类型,并统计ISB、DSB、DMB指令的数目,得到串行化指令数目;与此同时,监测处理器前端指令发射级并统计一段时间或者一个指令流片段内发射出的指令数目和花费的时钟数,来计算出前端指令发射速率;(2) For the fragments retained in the target thread, use the instruction set simulator to obtain the relevant micro-architecture-independent parameters of each fragment. The micro-architecture-independent parameters include dynamic instruction flow mixing ratio, critical path length, serialized instructions, and front-end instructions. The launch rate and the total running time of the target thread; by defining the structure of the instruction type, recording the type of each instruction, and counting the distribution of each type of instruction, so as to obtain the dynamic instruction flow mixing ratio; by defining the structure, statistics of the fixed instruction window Depending on the size, there is a distribution of the maximum number of related instructions, so as to obtain the distribution of the critical path length; by defining the type of retrieval instructions, and counting the number of ISB, DSB, and DMB instructions, the number of serialized instructions is obtained; at the same time, Monitor the front-end instruction emission level of the processor and count the number of instructions issued and the number of clocks spent in a period of time or within an instruction stream segment to calculate the front-end instruction emission rate; (3)首先,考虑BP神经网络对输入数据的要求,对每个片段的相关微架构无关参数进行预处理,形成对应片段的相关微架构无关参数向量;然后,通过主成分分析对每个相关微架构无关参数向量进行降维、去噪处理,形成对应片段的微结构无关数据集;(3) First, considering the requirements of the BP neural network for input data, preprocess the relevant micro-architecture-independent parameters of each segment to form the relevant micro-architecture-independent parameter vector of the corresponding segment; The microarchitecture-independent parameter vector is subjected to dimensionality reduction and denoising processing to form a microarchitecture-independent data set for the corresponding segment; (4)对目标线程中保留下来的片段,首先,通过自组织映射网络将所有微结构无关数据集分成N个大类;然后,通过k-均值聚类算法将第n个大类分成Mn个小类,1≤n≤N;选取每个小类中离中心点距离最近的点作为该小类的特征点;(4) For the fragments retained in the target thread, first, divide all microstructure-independent data sets into N categories through the self-organizing map network; then, divide the nth category into M n through the k-means clustering algorithm subclasses, 1≤n≤N; select the point closest to the center point in each subclass as the feature point of the subclass; (5)对目标线程中保留下来的片段,将所有特征点作为BP神经网络的输入,BP神经网络的输出为目标线程的稳态平均吞吐率,对BP神经网络的输入和输出进行拟合,训练得到目标线程的BP神经网络模型;(5) For the fragments retained in the target thread, all feature points are used as the input of the BP neural network, and the output of the BP neural network is the steady-state average throughput rate of the target thread, and the input and output of the BP neural network are fitted, Train the BP neural network model of the target thread; (6)BP神经网络模型训练完成后,通过指令集模拟器获得其他待预测线程的微架构无关参数信息,导入到训练好的BP神经网络模型中,即可快速准确地预测实际稳态平均吞吐率值。(6) After the training of the BP neural network model is completed, obtain the microarchitecture-independent parameter information of other threads to be predicted through the instruction set simulator, and import it into the trained BP neural network model to quickly and accurately predict the actual steady-state average throughput rate value. 2.根据权利 要求1所述的基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,其特征在于:所述步骤(5)中,BP神经网络有三个隐含层,第一隐含层采用30个神经元,第二隐含层采用15个神经元,第三隐含层采用15个神经元;输入层和第一隐含层之间、第一隐含层和第二隐含层之间采用logsig传递函数,第二隐含层和第三隐含层之间、第三隐含层和输出层之间采用purelin传递函数,各层结点之间的权重值均使用trainscg进行调节,训练方法采用LM算法。2. the predictive method of average throughput rate under the out-of-order processor steady state based on artificial neural network according to claim 1, it is characterized in that: in described step (5), BP neural network has three hidden layers, the first The first hidden layer uses 30 neurons, the second hidden layer uses 15 neurons, and the third hidden layer uses 15 neurons; between the input layer and the first hidden layer, between the first hidden layer and the second hidden layer The logsig transfer function is used between the second hidden layer, the purelin transfer function is used between the second hidden layer and the third hidden layer, and between the third hidden layer and the output layer, and the weight values between the nodes of each layer are equal Use trainscg to adjust, and the training method uses the LM algorithm. 3.根据权利 要求1所述的基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,其特征在于:所述步骤(1)中,阈值为1000,即删除时钟数小于1000的片段。3. the prediction method of average throughput rate under the out-of-order processor steady state based on artificial neural network according to claim 1, it is characterized in that: in described step (1), threshold is 1000, promptly deletes clock number and is less than 1000 fragments. 4.根据权利要求1所述的基于人工神经网络的乱序处理器稳态下平均吞吐率的预测方法,其特征在于:所述步骤(6)中,其他待预测线程包括目标程序中的线程,或者其他应用程序中的线程。4. the prediction method of average throughput rate under the out-of-order processor steady state based on artificial neural network according to claim 1, it is characterized in that: in described step (6), other thread to be predicted comprises the thread in target program , or threads in other applications.
CN201511019177.7A 2015-12-29 2015-12-29 The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network Expired - Fee Related CN105630458B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511019177.7A CN105630458B (en) 2015-12-29 2015-12-29 The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511019177.7A CN105630458B (en) 2015-12-29 2015-12-29 The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network

Publications (2)

Publication Number Publication Date
CN105630458A CN105630458A (en) 2016-06-01
CN105630458B true CN105630458B (en) 2018-03-02

Family

ID=56045450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511019177.7A Expired - Fee Related CN105630458B (en) 2015-12-29 2015-12-29 The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network

Country Status (1)

Country Link
CN (1) CN105630458B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628731B (en) * 2017-03-16 2020-12-22 华为技术有限公司 Method for selecting test instruction and processing equipment
CN110178123B (en) * 2017-07-12 2020-12-01 华为技术有限公司 Performance index evaluation method and device
CN108519906B (en) * 2018-03-20 2022-03-22 东南大学 A Modeling Method for Steady-State Instruction Throughput of Superscalar Out-of-Order Processors
CN108762811B (en) * 2018-04-02 2022-03-22 东南大学 A method for obtaining out-of-order memory access behavior patterns of applications based on clustering
CN111078291B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN109409014B (en) * 2018-12-10 2021-05-04 福州大学 Calculation method of annual sunshine hours based on BP neural network model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645839A (en) * 2005-01-25 2005-07-27 南开大学 Communicating network exchanging system and controlling method based on parallel buffer structure
CN101609416A (en) * 2009-07-13 2009-12-23 清华大学 A method to improve the performance tuning speed of distributed system
US8831205B1 (en) * 2002-03-07 2014-09-09 Wai Wu Intelligent communication routing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100241600A1 (en) * 2009-03-20 2010-09-23 Nokia Corporation Method, Apparatus and Computer Program Product for an Instruction Predictor for a Virtual Machine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8831205B1 (en) * 2002-03-07 2014-09-09 Wai Wu Intelligent communication routing
CN1645839A (en) * 2005-01-25 2005-07-27 南开大学 Communicating network exchanging system and controlling method based on parallel buffer structure
CN101609416A (en) * 2009-07-13 2009-12-23 清华大学 A method to improve the performance tuning speed of distributed system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
乱序处理器指令吞吐量平稳化的动态调节方法研究;董正杨;《中国优秀硕士学位论文全文库 信息科技辑》;20130715(第07期);I137-34 *

Also Published As

Publication number Publication date
CN105630458A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN105630458B (en) The Forecasting Methodology of average throughput under a kind of out-of order processor stable state based on artificial neural network
Ïpek et al. Efficiently exploring architectural design spaces via predictive modeling
Mayer et al. Predictable low-latency event detection with parallel complex event processing
CN105653790B (en) A kind of out-of order processor Cache memory access performance estimating method based on artificial neural network
Xiao et al. A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach
CN111144542A (en) Oil well productivity prediction method, device and equipment
CN105260794A (en) Load predicting method of cloud data center
Yu et al. Integrating clustering and learning for improved workload prediction in the cloud
CN111427750A (en) A GPU power consumption estimation method, system and medium of a computer platform
CN108885579B (en) Method and apparatus for data mining from kernel tracing
US10102323B2 (en) Micro-benchmark analysis optimization for microprocessor designs
CN107316200B (en) Method and device for analyzing user behavior period
Zhang et al. Rethinking influence functions of neural networks in the over-parameterized regime
CN103678004A (en) Host load prediction method based on unsupervised feature learning
CN108509723B (en) LRU Cache prefetching mechanism performance gain evaluation method based on artificial neural network
CN106297296B (en) A kind of fine granularity hourage distribution method based on sparse track point data
CN112560373B (en) Burr power analysis and optimization engine
CN103365731A (en) Method and system for reducing soft error rate of processor
CN104657198A (en) Memory access optimization method and memory access optimization system for NUMA (Non-Uniform Memory Access) architecture system in virtual machine environment
Metz et al. Towards neural hardware search: Power estimation of cnns for gpgpus with dynamic frequency scaling
CN111651946A (en) A method for hierarchical identification of circuit gates based on workload
WO2018032897A1 (en) Method and device for evaluating packet forwarding performance and computer storage medium
Liang et al. Prediction method of energy consumption based on multiple energy-related features in data center
CN118195088A (en) A building load prediction method based on transfer learning and related system
CN106649067A (en) Performance and energy consumption prediction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190321

Address after: 215123 Linquan Street 399, Dushu Lake Higher Education District, Suzhou Industrial Park, Jiangsu Province

Patentee after: Southeast University Suzhou Institute

Address before: 214135 No. 99 Linghu Avenue, Wuxi New District, Wuxi City, Jiangsu Province

Patentee before: SOUTHEAST UNIVERSITY-WUXI INSTITUTE OF INTEGRATED CIRCUIT TECHNOLOGY

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180302