CN114897098A

CN114897098A - Automatic mixing precision quantification method and device

Info

Publication number: CN114897098A
Application number: CN202210634034.0A
Authority: CN
Inventors: 张川; 葛荧萌; 冀贞昊; 张在琛; 黄永明; 尤肖虎
Original assignee: Network Communication and Security Zijinshan Laboratory
Current assignee: Network Communication and Security Zijinshan Laboratory
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-08-12
Also published as: WO2023236609A1

Abstract

The present invention provides an automatic mixed precision quantization method and device, wherein the method includes: acquiring intermediate variables to be quantized generated by a multiple-input multiple-output MIMO detector in the process of performing signal detection on a MIMO system; The algorithm trains the agent, and quantifies the decimal bit width of each intermediate variable based on the strategy stored in the trained agent; based on the probability density function, quantifies the integer bit width of each intermediate variable quantify. The invention can automatically realize the allocation of quantization bit widths of different intermediate variables in the MIMO detector, avoid a large number of quantization bit width redundancy, and save hardware resources.

Description

A kind of automatic mixed precision quantization method and device

技术领域technical field

本发明涉及机器学习技术领域，尤其涉及一种自动混合精度量化方法及装置。The invention relates to the technical field of machine learning, and in particular, to an automatic mixed precision quantization method and device.

背景技术Background technique

随着对通信系统的吞吐率和稳定性的需求不断增加，多输入多输出(MIMO，multiple-input multiple-output)系统由于其高频谱效率的潜力而受到广泛关注。由于大规模MIMO场景中最优检测算法的计算复杂度在硬件实现中难以承受，目前许多硬件友好型检测器已经实现并且在吞吐率、能量效率和面积效率方面呈现出明显的优势。但是，现有的硬件友好型检测器主要关注的是算术实现而不是量化优化，且为了节省设计工作量，大多采用对所有变量使用相同量化位宽的统一量化方案，导致大量的量化位宽冗余，造成硬件资源浪费。With the ever-increasing demands on the throughput and stability of communication systems, multiple-input multiple-output (MIMO, multiple-input multiple-output) systems have received extensive attention due to their potential for high spectral efficiency. Since the computational complexity of optimal detection algorithms in massive MIMO scenarios is unbearable in hardware implementation, many hardware-friendly detectors have been implemented and exhibit obvious advantages in throughput, energy efficiency, and area efficiency. However, the existing hardware-friendly detectors mainly focus on arithmetic implementation rather than quantization optimization, and in order to save design work, most of them adopt a unified quantization scheme that uses the same quantization bit width for all variables, resulting in a large number of redundant quantization bit widths. excess, resulting in a waste of hardware resources.

发明内容SUMMARY OF THE INVENTION

本发明提供一种自动混合精度量化方法及装置，用以解决现有技术中MIMO检测器对所有变量使用相同的量化方案导致的大量的量化位宽冗余，造成硬件资源浪费的缺陷，可以自动实现MIMO检测器中不同中间变量的量化位宽的分配，避免了大量的量化位宽冗余，节省硬件资源。The present invention provides an automatic mixed-precision quantization method and device, which are used to solve the defect of a large amount of quantization bit width redundancy caused by the use of the same quantization scheme for all variables in the prior art, resulting in waste of hardware resources. The allocation of quantization bit widths of different intermediate variables in the MIMO detector is realized, a large amount of quantization bit width redundancy is avoided, and hardware resources are saved.

本发明提供一种自动混合精度量化方法，包括：The present invention provides an automatic mixed precision quantization method, comprising:

获取MIMO检测器在对MIMO系统进行信号检测的过程中产生的待量化的中间变量；Obtaining intermediate variables to be quantized generated by the MIMO detector in the process of performing signal detection on the MIMO system;

通过深度强化学习算法对智能体进行训练，并基于训练好的所述智能体中存储的策略对每个所述中间变量的小数位宽进行量化；The agent is trained through a deep reinforcement learning algorithm, and the decimal width of each intermediate variable is quantified based on the strategy stored in the trained agent;

对每个所述中间变量的整数位宽进行量化。The integer bit width of each of the intermediate variables is quantized.

根据本发明提供的一种自动混合精度量化方法，所述通过深度强化学习算法对智能体进行训练，包括：According to an automatic mixed-precision quantization method provided by the present invention, the training of an agent through a deep reinforcement learning algorithm includes:

在每一回合中，初始化环境，基于马尔可夫决策过程将智能体与环境进行交互，并存储交互数据；In each round, initialize the environment, interact with the environment based on Markov decision process, and store the interaction data;

每达到预设回合次数，控制所述智能体基于当前存储的所述交互数据更新所述智能体中存储的策略，直至达到最大回合次数。Every time the preset number of rounds is reached, the agent is controlled to update the strategy stored in the agent based on the currently stored interaction data until the maximum number of rounds is reached.

根据本发明提供的一种自动混合精度量化方法，所述初始化环境，包括：According to an automatic mixed-precision quantization method provided by the present invention, the initialization environment includes:

随机抽取多个中间变量；Randomly select multiple intermediate variables;

初始化所有中间变量的小数位宽为预设的最大小数位宽，初始化所有中间变量的整数位宽为预设的最大整数位宽；Initialize the decimal width of all intermediate variables to the preset maximum decimal width, and initialize the integer width of all intermediate variables to the preset maximum integer width;

确定多个状态，所述多个状态包括所述多个中间变量对应的序号和小数位宽；determining a plurality of states, where the plurality of states include serial numbers and decimal bit widths corresponding to the plurality of intermediate variables;

从所述多个状态中随机选择并返回一个状态。A state is randomly selected from the plurality of states and returned.

根据本发明提供的一种自动混合精度量化方法，所述基于马尔可夫决策过程将智能体与环境进行交互，并存储交互数据，包括：According to an automatic mixed-precision quantification method provided by the present invention, the agent interacts with the environment based on the Markov decision process, and stores the interaction data, including:

在每个时刻执行以下步骤，直至当前时刻达到最大时刻，并存储每个时刻产生的交互数据，所述交互数据包括状态、动作值和奖励值：The following steps are performed at each moment until the current moment reaches the maximum moment, and the interaction data generated at each moment is stored, and the interaction data includes state, action value and reward value:

根据当前状态和当前策略确定动作值；其中，所述动作值用于表征小数位宽的变化量；Determine the action value according to the current state and the current strategy; wherein, the action value is used to represent the amount of change in the width of the decimal place;

根据所述动作值修改当前量化方案，得到修改后的量化方案；Modify the current quantization scheme according to the action value to obtain the modified quantization scheme;

根据奖励函数对所述修改后的量化方案进行评估，得到奖励值，同时选择并返回下一状态。The modified quantization scheme is evaluated according to the reward function, the reward value is obtained, and the next state is selected and returned.

根据本发明提供的一种自动混合精度量化方法，所述基于训练好的所述智能体中存储的策略对每个所述中间变量的小数位宽进行量化，包括：According to an automatic mixed-precision quantization method provided by the present invention, the quantization of the decimal width of each intermediate variable based on the strategy stored in the trained agent includes:

通过蒙特卡罗仿真算法将训练好的所述智能体中存储的策略转化为每个所述中间变量的小数位宽的统计分布；Convert the strategy stored in the trained agent into a statistical distribution of the decimal width of each intermediate variable through a Monte Carlo simulation algorithm;

基于所述中间变量的小数位宽的统计分布，对所述中间变量的小数位宽进行量化；quantifying the decimal width of the intermediate variable based on the statistical distribution of the decimal width of the intermediate variable;

其中，所述策略用于表征状态和动作值之间的映射关系，所述状态包括中间变量对应的序号和小数位宽，所述动作值用于表征小数位宽的变化量。The strategy is used to represent the mapping relationship between a state and an action value, the state includes a sequence number and a decimal width corresponding to an intermediate variable, and the action value is used to represent a change in the decimal width.

根据本发明提供的一种自动混合精度量化方法，所述通过蒙特卡罗仿真算法将训练好的所述智能体中存储的策略转化为每个所述中间变量的小数位宽的统计分布，包括：According to an automatic mixed-precision quantization method provided by the present invention, the strategy stored in the trained agent is transformed into a statistical distribution of the decimal width of each intermediate variable by using a Monte Carlo simulation algorithm, including: :

在每次测试时执行以下步骤，直至达到最大测试次数，获得每个所述中间变量取不同小数位宽的频次，从而获得每个所述中间变量的小数位宽的统计分布：The following steps are performed in each test until the maximum number of tests is reached, and the frequency of taking different decimal places for each of the intermediate variables is obtained, so as to obtain the statistical distribution of the decimal places of each of the intermediate variables:

针对每个所述中间变量，基于所述中间变量的当前状态和训练好的所述智能体中存储的策略确定动作值；For each of the intermediate variables, determine an action value based on the current state of the intermediate variables and the strategy stored in the trained agent;

基于所述动作值更改所述中间变量的小数位宽。The decimal width of the intermediate variable is changed based on the action value.

根据本发明提供的一种自动混合精度量化方法，所述基于所述中间变量的小数位宽的统计分布，对所述中间变量的小数位宽进行量化，包括：According to an automatic mixed-precision quantization method provided by the present invention, the quantization of the decimal width of the intermediate variable based on the statistical distribution of the decimal width of the intermediate variable includes:

基于所述中间变量的小数位宽的统计分布将所述中间变量的小数位宽的统计分布的均值确定为所述中间变量的小数位宽。A mean value of the statistical distribution of the decimal width of the intermediate variable is determined as the decimal width of the intermediate variable based on the statistical distribution of the decimal width of the intermediate variable.

根据本发明提供的一种自动混合精度量化方法，所述对每个所述中间变量的整数位宽进行量化，包括：According to an automatic mixed-precision quantization method provided by the present invention, the quantization of the integer bit width of each of the intermediate variables includes:

通过蒙特卡罗仿真算法生成每个所述中间变量的若干数据，得到第一数据集；Generate a number of data of each of the intermediate variables through a Monte Carlo simulation algorithm to obtain a first data set;

提取所述第一数据集中不在量化方案幅值范围内的数据，获得第二数据集；其中，量化方案幅值为以2为底数、以整数位宽为自变量的表达式减去以2为底数、以负的小数位宽为自变量的表达式的差；Extracting the data in the first data set that is not within the range of the quantization scheme amplitude, to obtain a second data set; wherein, the quantization scheme amplitude is an expression with a base of 2 and an integer bit width as an independent variable minus 2 as an expression Base, the difference of expressions with negative decimal places as arguments;

将满足预设条件的最小整数位宽确定为所述中间变量的整数位宽；其中，所述预设条件为：所述第二数据集中数据的数量与所述第一数据集中数据的数量的比值小于等于预设阈值。Determining the minimum integer bit width that satisfies the preset condition as the integer bit width of the intermediate variable; wherein, the preset condition is: the number of data in the second data set is equal to the number of data in the first data set. The ratio is less than or equal to the preset threshold.

本发明还提供一种自动混合精度量化装置，包括：The present invention also provides an automatic mixed-precision quantization device, comprising:

变量获取模块，用于获取MIMO检测器在对MIMO系统进行信号检测的过程中产生的待量化的中间变量；A variable acquisition module, configured to acquire intermediate variables to be quantified generated by the MIMO detector in the process of performing signal detection on the MIMO system;

第一量化模块，用于通过深度强化学习算法对智能体进行训练，并基于训练好的所述智能体中存储的策略对每个所述中间变量的小数位宽进行量化；a first quantization module, used for training the agent through a deep reinforcement learning algorithm, and quantizing the decimal width of each intermediate variable based on the strategy stored in the trained agent;

第二量化模块，用于基于概率密度函数，对每个所述中间变量的整数位宽进行量化。The second quantization module is configured to quantize the integer bit width of each of the intermediate variables based on the probability density function.

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述的自动混合精度量化方法。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the program, the automatic mixing as described in any of the above is realized Accuracy quantization method.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述的自动混合精度量化方法。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the automatic mixed-precision quantization method as described above.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述的自动混合精度量化方法。The present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements the automatic mixed-precision quantization method described above.

本发明提供的一种自动混合精度量化方法及装置，通过深度强化学习算法对智能体进行训练，并基于训练好的智能体中存储的策略对每个待量化的中间变量的小数位宽进行量化，可以自动实现不同中间变量的小数位宽的分配；而后，对每个中间变量的整数位宽进行量化，可以自动实现不同中间变量的整数位宽的分配。因此，本发明可以自动实现MIMO检测器中不同中间变量的量化位宽的分配，避免了大量的量化位宽冗余，节省硬件资源。The present invention provides an automatic mixed precision quantization method and device, which trains an agent through a deep reinforcement learning algorithm, and quantifies the decimal bit width of each intermediate variable to be quantized based on the strategy stored in the trained agent. , can automatically realize the allocation of the decimal bit width of different intermediate variables; then, quantify the integer bit width of each intermediate variable, can automatically realize the allocation of the integer bit width of different intermediate variables. Therefore, the present invention can automatically realize the allocation of quantization bit widths of different intermediate variables in the MIMO detector, avoid a large amount of quantization bit width redundancy, and save hardware resources.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明提供的自动混合精度量化方法的流程示意图；1 is a schematic flowchart of an automatic mixed-precision quantization method provided by the present invention;

图2是本发明提供的不同抽取变量数N_ext情形下，奖励值随网络更新次数的变化曲线图；Fig. 2 is the change curve diagram of reward value along with network update times under different extraction variable number N _ext situation provided by the present invention;

图3是本发明提供的浮点AMP检测器和在不同N_ext下的小数量化AMP检测器的BER性能比较图；Fig. 3 is the BER performance comparison diagram of the floating-point AMP detector provided by the present invention and the small-quantization _AMP detector under different Next;

图4是本发明提供的在BER＝10^-3时，不同N_ext值的平均小数位宽与SNR损失的对比图；4 is a comparison diagram of the average decimal bit width and SNR loss of different _Next values when BER=10 ⁻³ provided by the present invention;

图5是本发明提供的在不同L_a情形下，奖励值随网络更新次数的变化曲线图；5 is _a graph of the change of reward value with the number of network updates under different La situations provided by the present invention;

图6是本发明提供的浮点AMP检测器、小数量化AMP检测器以及小数整数量化AMP检测器的BER性能比较图；Fig. 6 is the BER performance comparison diagram of floating-point AMP detector, fractional quantization AMP detector and fractional integer quantization AMP detector provided by the present invention;

图7是本发明提供的AMP检测器和使用不同小数位宽的统一量化AMP检测器的BER性能比较图；Fig. 7 is the BER performance comparison diagram of the AMP detector provided by the present invention and the unified quantization AMP detector using different decimal bit widths;

图8是本发明提供的自动混合精度量化装置的结构示意图；8 is a schematic structural diagram of an automatic mixed-precision quantization device provided by the present invention;

图9是本发明提供的电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参照图1，图1是本发明提供的自动混合精度量化方法的流程示意图之一。如图1所示，本发明提供的面向多输入多输出MIMO检测器的自动混合精度量化方法主要包括以下步骤：Please refer to FIG. 1 . FIG. 1 is one of the schematic flow charts of the automatic mixed precision quantization method provided by the present invention. As shown in FIG. 1 , the automatic mixed-precision quantization method for a multiple-input multiple-output MIMO detector provided by the present invention mainly includes the following steps:

步骤101、获取MIMO检测器在对MIMO系统进行信号检测的过程中产生的待量化的中间变量；Step 101: Obtain an intermediate variable to be quantized generated by the MIMO detector in the process of performing signal detection on the MIMO system;

步骤102、通过深度强化学习算法对智能体进行训练，并基于训练好的智能体中存储的策略对每个中间变量的小数位宽进行量化；Step 102, train the agent through a deep reinforcement learning algorithm, and quantify the decimal width of each intermediate variable based on the strategy stored in the trained agent;

步骤103、对每个中间变量的整数位宽进行量化。Step 103: Quantize the integer bit width of each intermediate variable.

在步骤101中，关于MIMO系统，以上行链路为例，假设一个用户端配置N_t根发射天线、基站端配置N_r根接收天线的MIMO系统，其中N_t＜N_r。一般来说，可以将该MIMO系统简化为以下实数域数学模型：In step 101, regarding the MIMO system, taking the uplink as an example, assume a MIMO system in which N _t transmit antennas are configured at the user end and N _r receive antennas are configured at the base station end, where N _t <N _r . In general, this MIMO system can be simplified to the following real-domain mathematical model:

y＝Hx+n (1)y=Hx+n (1)

其中，

表示接收信号矢量；

表示发送信号矢量；H表示2N_r×2N_t维度的信道矩阵，本实施例中假设该信道为独立同分布(i.i.d.，independent andidentically distributed)的瑞利信道，其均值为0，方差为1/2N_r，并且接收端已知信道状况；

表示加性高斯白噪声，噪声均值为0，方差为

in,

Represents the received signal vector;

represents the transmitted signal vector; H represents the channel matrix of dimension 2N _r × 2N _t . In this embodiment, it is assumed that the channel is an independent and identically distributed (iid) Rayleigh channel with a mean value of 0 and a variance of 1/2N. _r , and the receiver knows the channel condition;

represents additive white Gaussian noise, the noise mean is 0, and the variance is

MIMO检测器用于对MIMO系统进行信号检测。在本实施例中，MIMO检测器可以为贝叶斯消息传递(BMP，Bayesian message passing)检测器，本实施例不限于此，MIMO检测器还可以为其他检测器。The MIMO detector is used for signal detection of the MIMO system. In this embodiment, the MIMO detector may be a Bayesian message passing (BMP, Bayesian message passing) detector, but this embodiment is not limited thereto, and the MIMO detector may also be other detectors.

在本步骤中，获取MIMO检测器在对MIMO系统进行信号检测的过程中产生的待量化的中间变量，不同的MIMO检测器产生的中间变量不同。In this step, the intermediate variables to be quantized generated by the MIMO detector during the signal detection process of the MIMO system are acquired, and the intermediate variables generated by different MIMO detectors are different.

在步骤102中，通过深度强化学习算法对智能体进行训练，并基于训练好的智能体中存储的策略对每个待量化的中间变量的小数位宽进行量化，可以自动实现不同中间变量的小数位宽的分配。In step 102, the agent is trained by a deep reinforcement learning algorithm, and the decimal width of each intermediate variable to be quantized is quantized based on the strategy stored in the trained agent, so that the decimal number of different intermediate variables can be automatically realized. bit-width allocation.

可选地，步骤102中，通过深度强化学习算法对智能体进行训练，包括以下子步骤：Optionally, in step 102, the agent is trained through a deep reinforcement learning algorithm, including the following sub-steps:

步骤1021、在每一回合中，初始化环境，基于马尔可夫决策过程将智能体与环境进行交互，并存储交互数据；Step 1021. In each round, initialize the environment, interact with the agent based on the Markov decision process, and store the interaction data;

步骤1022、判断是否达到预设回合次数，若是，则转入步骤1023，若否，则转入步骤1021；Step 1022, determine whether the preset number of rounds is reached, if so, go to step 1023, if not, go to step 1021;

步骤1023、控制智能体基于当前存储的交互数据更新智能体中存储的策略；Step 1023, controlling the agent to update the strategy stored in the agent based on the currently stored interaction data;

步骤1024、判断是否达到最大回合次数，若是，则训练结束，若否，则转入步骤1021。Step 1024 , determine whether the maximum number of rounds has been reached, if so, the training ends; if not, go to step 1021 .

在步骤1021中，深度强化学习(DRL，deep reinforcement learning)算法作为机器学习(ML，machine learning)的一个分支，关注与环境交互以做出正确决策。在本实施例中，假设智能体与环境交互的过程为马尔可夫决策过程(MDP，Markov decision process)。In step 1021, a deep reinforcement learning (DRL, deep reinforcement learning) algorithm, as a branch of machine learning (ML, machine learning), focuses on interacting with the environment to make correct decisions. In this embodiment, it is assumed that the process of interaction between the agent and the environment is a Markov decision process (MDP, Markov decision process).

在每一回合中，初始化环境，假设智能体与环境交互的过程为马尔可夫决策过程，可以基于马尔可夫决策过程将智能体与环境进行交互，并在交互时存储交互数据。In each round, the environment is initialized, and it is assumed that the interaction process between the agent and the environment is a Markov decision process. Based on the Markov decision process, the agent can interact with the environment, and the interaction data is stored during the interaction.

在步骤1023中，每达到预设回合次数，智能体基于当前存储的交互数据更新一次智能体中存储的策略。In step 1023, every time the preset number of rounds is reached, the agent updates the strategy stored in the agent based on the currently stored interaction data.

在本实施例中，基于马尔可夫决策过程将智能体与环境进行不断交互，通过深度强化学习算法来训练智能体，使智能体能够通过学习得到策略。In this embodiment, the agent is constantly interacting with the environment based on the Markov decision process, and the agent is trained through a deep reinforcement learning algorithm, so that the agent can obtain a strategy through learning.

可选地，步骤1021中，初始化环境，可以包括：Optionally, in step 1021, initializing the environment may include:

步骤10211、随机抽取多个中间变量；Step 10211, randomly extract multiple intermediate variables;

步骤10212、初始化所有中间变量的小数位宽为预设的最大小数位宽，初始化所有中间变量的整数位宽为预设的最大整数位宽；Step 10212: Initialize the decimal width of all intermediate variables to the preset maximum decimal width, and initialize the integer width of all intermediate variables to the preset maximum integer width;

步骤10213、确定多个状态，多个状态包括多个中间变量对应的序号和小数位宽；Step 10213, determine a plurality of states, and the plurality of states include the serial numbers and decimal bit widths corresponding to a plurality of intermediate variables;

步骤10214、从多个状态中随机选择并返回一个状态。Step 10214: Randomly select and return a state from multiple states.

在步骤10211中，考虑到MIMO检测中对误比特率(BER，bit-error-rate)性能的严格要求，以及不同中间变量之间的较大相关性，在一个回合中考虑所有中间变量会引入严重的奖励误判，导致量化结果不佳。因此，在一个回合中，随机抽取N_ext(1＜N_ext≤N_all)个中间变量，得到变量集合S_ext；其中，N_ext表示预先设定的随机抽取的中间变量的数量，N_all表示所有中间变量的数量，S_ext表示随机抽取的中间变量的集合。虽然在一个回合中只考虑N_ext个抽取的变量之间的相关性，但只要回合数量足够，所有变量之间的相关性都将会被考虑到。In step 10211, considering the strict requirements for bit-error-rate (BER, bit-error-rate) performance in MIMO detection and the large correlation between different intermediate variables, considering all intermediate variables in one round will introduce Serious reward misjudgment, resulting in poor quantitative results. Therefore, in a round, N _ext (1<N _ext ≤N _all ) intermediate variables are randomly selected to obtain a variable set S _ext ; where N _ext represents the preset number of randomly selected intermediate variables, and N _all represents The number of all intermediate variables, _Sext represents the set of randomly drawn intermediate variables. Although only the correlations between the _Next drawn variables are considered in an episode, as long as the number of episodes is sufficient, the correlations between all variables will be considered.

在步骤10213中，状态定义为一个向量(oh(k),oh(q_k))，其中，k∈{1,2,...,N_all}，k表示第k个待量化的中间变量的序号；q_k∈{0,1,...,q_max}，q_k表示第k个中间变量的小数位宽，q_max表示q_k的预设最大值，即第k个中间变量的预设的最大小数位宽；操作oh()表示返回独热编码值。In step 10213, the state is defined as a vector (oh(k), oh(q _k )), where k∈{1,2,...,N _all }, k represents the kth intermediate variable to be quantized q _k ∈{0,1,...,q _max }, q _k represents the decimal width of the kth intermediate variable, q _max represents the preset maximum value of q _k , that is, the kth intermediate variable The preset maximum decimal width; the operation oh() means to return the one-hot encoded value.

在本步骤中，将随机抽取的多个中间变量的小数位宽，初始化为预设的最大小数位宽q_max，因此可以获得多个中间变量分别对应的多个状态，即(oh(k'),oh(q_max))，k'∈S_ext。In this step, the decimal width of multiple intermediate variables randomly selected is initialized to the preset maximum decimal width q _max , so multiple states corresponding to multiple intermediate variables can be obtained, namely (oh(k' ), oh(q _max )), k'∈S _ext .

步骤10214、从多个状态中随机选择并返回一个状态(oh(k'),oh(q_k'))，k'∈S_ext。Step 10214: Randomly select and return a state (oh(k'), oh(q _k' )), k'∈S _ext , from multiple states.

在本实施例中，首先随机抽取多个中间变量，然后初始化所有中间变量的小数位宽为预设的最大小数位宽，并初始化所有中间变量的整数位宽为预设的最大整数位宽，最后从多个中间变量分别对应的多个状态中随机选择并返回一个状态，可以实现环境的初始化。In this embodiment, multiple intermediate variables are randomly selected first, then the decimal width of all intermediate variables is initialized to the preset maximum decimal width, and the integer width of all intermediate variables is initialized to the preset maximum integer width, Finally, a state is randomly selected from multiple states corresponding to multiple intermediate variables and a state is returned, which can realize the initialization of the environment.

可选地，步骤1021中，基于马尔可夫决策过程将智能体与环境进行交互，并存储交互数据，包括：Optionally, in step 1021, the agent interacts with the environment based on the Markov decision process, and the interaction data is stored, including:

步骤10215、在每个时刻，根据当前状态和当前策略确定动作值；Step 10215, at each moment, determine the action value according to the current state and the current strategy;

步骤10216、根据动作值修改当前量化方案，得到修改后的量化方案；Step 10216, modifying the current quantization scheme according to the action value to obtain a modified quantization scheme;

步骤10217、根据奖励函数对修改后的量化方案进行评估，得到奖励值，同时选择并返回下一状态，转入步骤10218；其中，同时存储每个时刻产生的交互数据，交互数据包括状态、动作值和奖励值；Step 10217: Evaluate the modified quantification scheme according to the reward function, obtain the reward value, select and return to the next state at the same time, and go to step 10218; wherein, the interaction data generated at each moment is stored at the same time, and the interaction data includes state, action value and reward value;

步骤10218、判断当前时刻是否达到最大时刻，若是，则转入步骤1022，若否，转入步骤10215。Step 10218: Determine whether the current time reaches the maximum time, if yes, go to step 1022, if not, go to step 10215.

为了更清楚地理解智能体与环境的交互过程，首先分别对量化方案、动作和奖励函数进行说明，具体如下：In order to understand the interaction process between the agent and the environment more clearly, the quantification scheme, action and reward function are firstly explained, as follows:

(1)量化方案(1) Quantization scheme

由于线性量化在硬件上可以高效实现，本实施例均采用线性量化。具体的量化方案如下：符号部分、整数部分和小数部分分别取1、p和q，简写为1-p-q。对于值为v的中间变量，量化后的值v_Q可以表示为：Since linear quantization can be efficiently implemented in hardware, linear quantization is adopted in this embodiment. The specific quantization scheme is as follows: the sign part, the integer part and the fractional part take 1, p and q respectively, abbreviated as 1-pq. For an intermediate variable with value v, the quantized value v _Q can be expressed as:

v_Q＝round(clip(v,-B,B)/C)×C (2)v _Q = round(clip(v,-B,B)/C)×C (2)

其中，B＝2^p-2^-q和C＝2^-q。where B= ^2p -2- ^q and C=2- ^q .

(2)动作(2) Action

动作表示小数位宽的变化量。如果采取动作后的位宽(记为q'_k)不在集合{0,1,...,q_max}中，可以裁剪q'_k进入范围[0,q_max]中。但是，这会增加取0或q_max的概率(小于0或大于q_max的值被强制为0或q_max)，导致智能体更有可能停留在两个极值点。因此，如果q_k大于q_max，可以从0开始计算；如果q_k小于0，可以从q_max开始计算。因此，可以通过以下小数位宽变化的表达式来表示：The action represents the amount of change in the width of the decimal places. If the bit width (denoted as q' _k ) after taking the action is not in the set {0,1,...,q _max }, q' _k can be clipped into the range [0,q _max ]. However, this increases the probability of taking 0 or _qmax (values less than 0 or greater than _qmax are forced to 0 or _qmax ), causing the agent to be more likely to stay at the two extreme points. Therefore, if q _k is greater than q _max , the calculation can be started from 0; if q _k is less than 0, the calculation can be started from q _max . Therefore, it can be represented by the following expression for the change in decimal width:

q′_k＝q_k+a_t mod(q_max+1) (3)q′ _k = _q _k +at mod(q _max +1) (3)

在智能体采取行动a_t后，第k个变量的小数位宽变为q'_k。After the agent takes action a _t , the fractional width of the kth variable becomes q' _k .

将动作的影响范围定义为小数位宽变化的最大绝对值，记为L_a。那么，动作空间

就是集合

当L_a＝1时，小数位宽最多只能改变1位，可能导致学习效率低，容易陷入局部最优解。当

时，智能体可以灵活地将当前小数位宽更改为任何其他小数位宽，但是这可能会导致学习过程的不稳定和收敛性能的下降。The influence range of the action is defined as the maximum absolute value of the change in the width of decimal places, denoted as L _a . So, the action space

is a collection

When La ₌ 1, the width of the decimal place can only be changed by 1 digit at most, which may lead to low learning efficiency and easy to fall into the local optimal solution. when

, the agent has the flexibility to change the current fractional width to any other fractional width, but this may lead to instability in the learning process and degradation of convergence performance.

(3)奖励函数(3) Reward function

奖励主要取决于BER性能和小数位宽。由于MIMO检测对性能有严格的要求，因此奖励函数旨在在保证BER性能的前提下减小小数位宽。The reward mainly depends on the BER performance and the fractional width. Since MIMO detection has strict performance requirements, the reward function is designed to reduce the fractional bit width under the premise of guaranteeing BER performance.

关于BER性能的评估：蒙特卡罗仿真通常用于测试特定检测算法的BER性能。考虑到精确的BER需要大量样本才能进行蒙特卡罗仿真，因此可以在智能体开始学习之前预先计算好，记作P_b。在评估量化检测器的BER性能时，环境使用少量样本同时仿真浮点检测器和量化检测器的BER性能以节省计算时间。浮点检测器和量化检测器的对应BER分别写为

和

量化检测器的BER相对误差定义为

但是由于在评估量化检测器的BER性能时仿真的样本数量较少，

可能会等于0。因此，改为使用

作为相对误差。如果相对误差大于阈值ε₂，则无法保证量化检测器的性能，环境返回奖励r_t＝-1。否则，环境将考虑小数位宽。Regarding the evaluation of BER performance: Monte Carlo simulations are often used to test the BER performance of a specific detection algorithm. Considering that accurate BER requires a large number of samples for Monte Carlo simulation, it can be pre-computed before the agent starts learning, denoted as P _b . When evaluating the BER performance of the quantized detector, the environment simulates the BER performance of both the floating-point detector and the quantized detector using a small number of samples to save computation time. The corresponding BERs of the floating-point detector and the quantized detector are written as

and

The BER relative error of the quantization detector is defined as

However, due to the small number of simulated samples when evaluating the BER performance of the quantized detector,

May be equal to 0. So instead use

as a relative error. If the relative error is greater than the threshold ε ₂ , the performance of the quantized detector cannot be guaranteed, and the environment returns a reward r _t =-1. Otherwise, the environment will consider the decimal width.

将当前量化方案所有变量的平均小数位宽表示为

相对误差小于ε₂时的奖励函数定义为

其中θ₁和θ₂是经验参数。与线性函数相比，指数函数可以让环境在小数位宽较小的情况下返回更大的奖励值，从而导致智能体更倾向于减小小数位宽。综上奖励函数如下：Express the average decimal width of all variables in the current quantization scheme as

The reward function when the relative error is less than _ε2 is defined as

where θ ₁ and θ ₂ are empirical parameters. Compared to linear functions, exponential functions allow the environment to return larger reward values with smaller fractional widths, causing the agent to be more inclined to reduce the fractional widths. In summary, the reward function is as follows:

然后，环境选择下一个中间变量，记为第k'个中间变量，状态更新为(oh(k'),oh(q_k'))。Then, the environment selects the next intermediate variable, denoted as the k'th intermediate variable, and the state is updated to (oh(k'), oh(q _k' )).

基于上述说明，下面分别对步骤10215-10218进行详细介绍：Based on the above description, steps 10215-10218 are described in detail below:

在步骤10215中，在t时刻，根据当前状态

和当前策略

确定动作值

其中，

和

分别表示状态空间和动作空间，策略用于表征状态和动作值之间的映射关系。In step 10215, at time t, according to the current state

and current strategy

determine action value

in,

and

represent the state space and action space, respectively, and the policy is used to represent the mapping relationship between the state and action values.

在步骤10216中，根据动作值

修改当前量化方案，得到修改后的量化方案。In step 10216, according to the action value

Modify the current quantization scheme to obtain a modified quantization scheme.

当前量化方案：v_Q＝round(clip(v,-B,B)/C)×C，其中，

和

Current quantization scheme: v _Q =round(clip(v,-B,B)/C)×C, where,

and

修改后的量化方案：v_Q＝round(clip(v,-B',B')/C')×C'，其中，q′_k＝q_k+a_t，

和

Modified quantization scheme: v _Q =round(clip(v,-B',B')/C')×C', where q' _k = _q _k +at ,

and

在步骤10217中，根据奖励函数对修改后的量化方案进行评估，得到奖励值

表示奖励函数，同时选择并返回下一状态(oh(k'),oh(q_k'))。In step 10217, the modified quantification scheme is evaluated according to the reward function, and the reward value is obtained

Represents the reward function while choosing and returning the next state (oh(k'), oh(q _k' )).

在步骤10218中，在每一时刻，智能体与环境交互一次，即执行一次步骤10215-10217，当前时刻t指的是：智能体与环境正在进行交互的时刻，最大时刻t_max指的是：预设的智能体与环境进行交互的最大时刻。重复执行上述步骤10215-10217直至当前时刻t达到最大时刻t_max，同时存储每个时刻产生的交互数据，交互数据包括状态、动作值和奖励值。In step 10218, at each moment, the agent interacts with the environment once, that is, steps 10215-10217 are executed once. The current moment t refers to the moment when the agent is interacting with the environment, and the maximum moment t _max refers to: The preset maximum moment when the agent interacts with the environment. The above steps 10215-10217 are repeatedly performed until the current time t reaches the maximum time t _max , and the interaction data generated at each time is stored at the same time, and the interaction data includes the state, the action value and the reward value.

可选地，步骤102中，基于训练好的智能体中存储的策略对每个中间变量的小数位宽进行量化，包括：Optionally, in step 102, the decimal width of each intermediate variable is quantified based on the strategy stored in the trained agent, including:

步骤1025、通过蒙特卡罗仿真算法将训练好的智能体中存储的策略转化为每个中间变量的小数位宽的统计分布；Step 1025: Convert the strategy stored in the trained agent into a statistical distribution of the decimal width of each intermediate variable through a Monte Carlo simulation algorithm;

步骤1026、基于中间变量的小数位宽的统计分布，确定中间变量的小数位宽。Step 1026: Determine the decimal width of the intermediate variable based on the statistical distribution of the decimal width of the intermediate variable.

在步骤1025中，可以包括如下子步骤：In step 1025, the following sub-steps may be included:

步骤10251、在每次测试时，针对每个中间变量，基于中间变量的当前状态和训练好的智能体中存储的策略确定动作值；Step 10251, during each test, for each intermediate variable, determine the action value based on the current state of the intermediate variable and the strategy stored in the trained agent;

步骤10252、基于动作值更改中间变量的小数位宽；Step 10252, change the decimal place width of the intermediate variable based on the action value;

步骤10253、判断是否达到最大测试次数，若是，则转入步骤10254，若否，转入步骤10251；Step 10253, determine whether the maximum number of tests has been reached, if so, go to step 10254, if not, go to step 10251;

步骤10254、获得每个中间变量取不同小数位宽的频次；Step 10254: Obtain the frequency that each intermediate variable takes different decimal place widths;

步骤10255、获得每个中间变量的小数位宽的统计分布。Step 10255: Obtain the statistical distribution of the decimal place width of each intermediate variable.

在步骤10251中，在每次测试时，固定环境每次选择的中间变量都为第k个中间变量，根据当前状态(oh(k),oh(q_k))和当前策略

确定动作值

In step 10251, in each test, the intermediate variable selected by the fixed environment each time is the kth intermediate variable, according to the current state (oh(k), oh(q _k )) and the current strategy

determine action value

在步骤10252中，基于动作值

更改第k个中间变量的小数位宽q′_k＝q_k+a_t。In step 10252, based on the action value

Change the decimal width of the k-th intermediate variable q′ _k = _q _k +at .

在步骤10254中，在达到最大测试次数后，获得第k个中间变量取不同小数位宽的频次。In step 10254, after the maximum number of tests is reached, the frequency of the kth intermediate variable taking different decimal place widths is obtained.

在步骤10255中，最大测试时刻足够大之后，就可以逼近智能体学习到的第k个变量的小数位宽的统计分布。In step 10255, after the maximum test time is large enough, the statistical distribution of the decimal width of the kth variable learned by the agent can be approximated.

在本实施例中，可以通过蒙特卡罗仿真算法，将训练好的智能体中存储的策略转化为中间变量的小数位宽的统计分布。In this embodiment, a Monte Carlo simulation algorithm can be used to convert the strategy stored in the trained agent into a decimal-wide statistical distribution of intermediate variables.

在步骤1026中，基于中间变量的小数位宽的统计分布，将中间变量的小数位宽的统计分布的均值确定为中间变量的小数位宽。In step 1026, based on the statistical distribution of the decimal width of the intermediate variable, the mean value of the statistical distribution of the decimal width of the intermediate variable is determined as the decimal width of the intermediate variable.

在本实施例中，计算第k个变量的小数位宽的统计分布的均值作为q_k期望值的估计。In this embodiment, the mean value of the statistical distribution of the decimal place width of the kth variable is calculated as an estimate of the expected value of _qk .

可选地，步骤103可以包括如下子步骤：Optionally, step 103 may include the following sub-steps:

步骤1031、通过蒙特卡罗仿真算法生成每个中间变量的若干数据，得到第一数据集；Step 1031, generating a number of data of each intermediate variable through a Monte Carlo simulation algorithm to obtain a first data set;

步骤1032、提取第一数据集中不在量化方案幅值范围内的数据，获得第二数据集；其中，量化方案幅值为以2为底数、以整数位宽为自变量的表达式减去以2为底数、以负的小数位宽为自变量的表达式的差；Step 1032: Extract the data in the first data set that is not within the range of the quantization scheme amplitude value to obtain a second data set; wherein, the quantization scheme amplitude value is an expression with a base of 2 and an integer bit width as an independent variable minus 2. difference of expressions in base and with negative decimal width as argument;

步骤1033、将满足预设条件的最小整数位宽确定为中间变量的整数位宽；其中，预设条件为：第二数据集中数据的数量与第一数据集中数据的数量的比值小于等于预设阈值。Step 1033: Determine the minimum integer bit width that satisfies the preset condition as the integer bit width of the intermediate variable; wherein, the preset condition is: the ratio of the number of data in the second data set to the number of data in the first data set is less than or equal to the preset threshold.

在步骤1031中，通过蒙特卡罗仿真算法生成第k个变量中间变量的若干数据，得到第一数据集S。In step 1031, several data of the k-th variable intermediate variable are generated by the Monte Carlo simulation algorithm to obtain the first data set S.

在步骤1032中，对于1-p-q量化方案，量化方案幅值为B＝2^p-2^-q，量化方案幅值范围为[-B,B]，提取第一数据集S中不在[-B,B]范围内的数据，组成第二数据集S′，S′＝{v||v|＞|B|,v∈S}。In step 1032, for the 1-pq quantization scheme, the amplitude of the quantization scheme is B=2 ^p -2 ^-q , the amplitude range of the quantization scheme is [-B, B], and the first data set S is extracted that is not in [-B ,B], form the second data set S′, S′={v||v|>|B|,v∈S}.

在步骤1033中，将满足

的最小整数位宽确定为中间变量的整数位宽，即In step 1033, it will be satisfied

The minimum integer bit width of is determined as the integer bit width of the intermediate variable, that is

其中，ε₁表示预设阈值。当ε₁＝0时，则card(S′)＝0，表示该量化方案可以覆盖变量的所有的取值。Among them, ε ₁ represents a preset threshold. When ε ₁ =0, then card(S')=0, indicating that the quantization scheme can cover all values of the variable.

在本实施例中，对每个中间变量的整数位宽进行量化，可以自动实现不同中间变量的整数位宽的分配。In this embodiment, the integer bit width of each intermediate variable is quantized, and the allocation of the integer bit width of different intermediate variables can be automatically realized.

综上所述，本发明提供的一种面向多输入多输出MIMO检测器的自动混合精度量化方法，通过深度强化学习算法对智能体进行训练，并基于训练好的智能体中存储的策略对每个待量化的中间变量的小数位宽进行量化，可以自动实现不同中间变量的小数位宽的分配；而后，对每个中间变量的整数位宽进行量化，可以自动实现不同中间变量的整数位宽的分配。因此，本发明可以自动实现MIMO检测器中不同中间变量的量化位宽的分配，避免了大量的量化位宽冗余，节省硬件资源。To sum up, the present invention provides an automatic mixed-precision quantization method for a multiple-input multiple-output MIMO detector, which trains the agent through a deep reinforcement learning algorithm, and quantifies each agent based on the strategy stored in the trained agent. Quantizing the decimal bit width of each intermediate variable to be quantized can automatically realize the allocation of the decimal bit width of different intermediate variables; then, quantizing the integer bit width of each intermediate variable can automatically realize the integer bit width of different intermediate variables. allocation. Therefore, the present invention can automatically realize the allocation of quantization bit widths of different intermediate variables in the MIMO detector, avoid a large amount of quantization bit width redundancy, and save hardware resources.

下面以具体实例对本实施例提供的方法进行验证。The method provided in this embodiment is verified below with a specific example.

本实施例以下述配置为例，发送信号矢量从16QAM调制中均匀随机产生，信道矩阵从发送天线数为8，接收天线数为128的瑞利信道模型中随机产生，MIMO检测器选择为近似消息传递(AMP，approximate message passing)检测器，其迭代次数设定为4。经过一定的仿真实验，发现ε₁＝10^-4,ε₂＝0.8,θ₁＝10以及θ₂＝4时可以在BER性能和量化位宽之间达到很好的权衡。因此，在接下来仿真实验过程中固定了这些超参数。In this embodiment, the following configuration is used as an example. The transmit signal vector is uniformly and randomly generated from 16QAM modulation, the channel matrix is randomly generated from the Rayleigh channel model with 8 transmit antennas and 128 receive antennas, and the MIMO detector is selected as an approximate message The AMP (approximate message passing) detector is set to 4 iterative times. After certain simulation experiments, it is found that ε ₁ =10 ^-4 , ε ₂ =0.8, θ ₁ =10 and θ ₂ =4 can achieve a good trade-off between BER performance and quantization bit width. Therefore, these hyperparameters were fixed during the following simulation experiments.

策略网络和价值网络都由具有6个隐藏层的全连接DNN组成，策略网络用于存储智能体学习到的策略，价值网络用于评价当前状态的价值。6个隐藏层的维度分别为64、128、256、256、128、64。策略网络和价值网络的输入层的维度都等于状态空间的维度。策略网络输出层的维度等于动作空间的维度，价值网络的维度是1。学习率设置为0.002，智能体采用近端策略优化(Proximal Policy Optimization，PPO)算法，PPO算法中的衰减因子设置为0.99，∈裁剪值设置为0.2，学习各个变量的小数位宽直到达到最大回合数。Both the policy network and the value network consist of a fully connected DNN with 6 hidden layers. The policy network is used to store the policy learned by the agent, and the value network is used to evaluate the value of the current state. The dimensions of the 6 hidden layers are 64, 128, 256, 256, 128, and 64, respectively. The dimension of the input layer of both the policy network and the value network is equal to the dimension of the state space. The dimension of the output layer of the policy network is equal to the dimension of the action space, and the dimension of the value network is 1. The learning rate is set to 0.002, the agent adopts the Proximal Policy Optimization (PPO) algorithm, the attenuation factor in the PPO algorithm is set to 0.99, the ∈ clipping value is set to 0.2, and the decimal width of each variable is learned until the maximum round is reached. number.

仿真结果分析如下：The simulation results are analyzed as follows:

1)N_ext大小的影响1) The effect of N _ext size

环境返回的奖励与迭代次数的关系如图2所示。不同的N_ext情形下，随着迭代次数的增加，奖励值先增加，最后围绕某个值波动，说明智能体在学习如何更好地量化检测算法中的中间变量。当N_ext＝1时，由于未考虑不同中间变量之间的相关性，智能体可以将变量的小数位宽降低到非常低，也就是如图2所示的较大的奖励值。随着N_ext的增加，收敛时的奖励值变小。这主要是因为奖励误判问题使得智能体在减少小数位宽方面更具挑战性。The relationship between the reward returned by the environment and the number of iterations is shown in Figure 2. In different _Next situations, as the number of iterations increases, the reward value first increases and then fluctuates around a certain value, indicating that the agent is learning how to better quantify the intermediate variables in the detection algorithm. When N _ext = 1, since the correlation between different intermediate variables is not considered, the agent can reduce the decimal width of variables to a very low value, that is, a larger reward value as shown in Figure 2. As N _ext increases, the reward value at convergence becomes smaller. This is mainly because the reward misjudgment problem makes the agent more challenging to reduce the decimal width.

浮点AMP检测器和本实施例中的小数量化AMP检测器在不同N_ext下的BER性能比较如图3所示，其中图例条目“N_ext＝1”表示N_ext＝1时本实施例中的小数量化AMP检测器的性能曲线，其他的类似。虽然智能体在N_ext＝1时分配的小数位宽最少，但是性能会严重下降。N_ext＝5，N_ext＝15和N_ext＝21时的小数量化AMP检测器的性能表现与浮点AMP检测器的性能表现基本相同。The BER performance comparison between the floating-point AMP detector and the small- _quantization AMP detector in this embodiment under different Next conditions is shown in Figure 3, where the legend entry " _Next = 1" indicates this embodiment when _Next = 1 The performance curves of a small number of quantified AMP detectors in , others are similar. Although the agent allocates the least fractional bit width when _Next = 1, the performance is severely degraded. The performance of the small-quantization AMP detector for N _ext =5, N _ext =15 and N _ext =21 is basically the same as that of the floating-point AMP detector.

图4中显示了在BER＝10^-3时，不同N_ext值的平均小数位宽与SNR损失的对比图。横轴代表本实施例中不同N_ext的小数量化AMP检测器与作为基准的浮点AMP检测器相比的SNR损失。纵轴代表平均小数位宽。如图4所示，平均小数位宽在N_ext＝1时最低，但是其性能损失较大。当N_ext＝5时，小数量化AMP检测器可以在保持原算法的性能的同时分配更少的小数位宽。而N_ext＝15和N_ext＝21的性能损失和N_ext＝5相差较小，但是平均小数位宽较大。因此在下面的分析中，将N_ext固定为5。Figure 4 shows a plot of the average fractional width versus SNR loss for different _Next values at BER = 10 ⁻³ . The horizontal axis represents the SNR loss of the small number of quantized AMP detectors of different _Next in this example compared to the floating-point AMP detector as a benchmark. The vertical axis represents the average decimal place width. As shown in Figure 4, the average fractional bit width is the lowest when _Next = 1, but its performance loss is relatively large. When N _ext =5, the fractional quantized AMP detector can allocate less fractional bit width while maintaining the performance of the original algorithm. On the other hand, the performance loss of N _ext =15 and _Next =21 is smaller than that of N _ext =5, but the average decimal bit width is larger. Therefore, in the following analysis, N _ext is fixed to 5.

2)L_a大小的影响2) The influence of the size of L _a

不同L_a情形下，奖励值随网络更新次数的变化如图5所示。与L_a＝1的情况相比，L_a＝2时的收敛性能略有优势。在L_a＝1和L_a＝2两种情况下，奖励值最终收敛到大致相同的值。当L_a＝5时，收敛速度变得很慢，收敛时的奖励值比其他两种情况要小。因此，当L_a太大时，收敛速度和最终小数位宽都不能令人满意。在下面的讨论中，我们将L_a固定为2。Figure 5 shows the change of the reward value with the number of network updates under different _La conditions. Compared with the case of La ₌ 1, the convergence performance with La ₌ 2 is slightly better. In both cases _{La =} 1 and La ₌ 2, the reward value eventually converges to approximately the same value. When La ₌ 5, the convergence speed becomes very slow, and the reward value at convergence is smaller than the other two cases. Therefore, when _La is too large, neither the convergence speed nor the final fractional bit width is satisfactory. In the following discussion, we fix _La to 2.

3)整数量化3) Integer quantization

在获得了小数位宽后，可以根据上述表达式(5)计算出整数位宽。图6展示了浮点AMP检测器、小数量化AMP检测器以及小数整数量化AMP检测器的BER性能比较。可以看出，基于PDF的整数量化可以完美地保持浮点AMP检测器的BER性能。After the decimal bit width is obtained, the integer bit width can be calculated according to the above expression (5). Figure 6 shows a comparison of the BER performance of a floating point AMP detector, a fractional quantized AMP detector, and a fractional integer quantized AMP detector. It can be seen that the PDF-based integer quantization can perfectly preserve the BER performance of the floating-point AMP detector.

4)与统一量化方案的对比4) Comparison with the unified quantification scheme

将本实施例提出的自动混合精度量化方案与统一量化方案进行比较。统一量化的整数位宽可以通过基于PDF的整数量化得到，即6bit。至于小数位宽，浮点AMP检测器和使用不同小数位宽的统一量化AMP检测器的BER性能比较如图7所示。具有4bit和5bit位小数位宽的统一量化AMP遭受严重的性能恶化。具有6bit小数位宽的统一量化AMP检测器几乎可以完美地恢复浮点AMP检测器的性能。因此，统一量化AMP检测器的量化方案取为1-6-6。基于自动混合精度量化的AMP检测器和统一量化AMP检测器的平均位宽比较见表1。与统一量化AMP检测器相比，基于自动混合精度量化的AMP检测器在整数位宽和小数位宽上分别减少了57.2％和58％的量化位宽。The automatic mixed-precision quantization scheme proposed in this embodiment is compared with the unified quantization scheme. The integer bit width of uniform quantization can be obtained by PDF-based integer quantization, that is, 6 bits. As for the fractional bit width, the BER performance comparison of the floating-point AMP detector and the uniformly quantized AMP detector using different fractional bit widths is shown in Figure 7. Uniform quantized AMPs with 4-bit and 5-bit fractional bit widths suffer severe performance degradation. A uniformly quantized AMP detector with a 6-bit fractional bit width can almost perfectly restore the performance of a floating-point AMP detector. Therefore, the quantization scheme of the unified quantization AMP detector is taken as 1-6-6. The average bit width comparison between the AMP detector based on automatic mixed-precision quantization and the unified quantization AMP detector is shown in Table 1. Compared with the unified quantization AMP detector, the AMP detector based on automatic mixed-precision quantization reduces the quantization bit width by 57.2% and 58% in integer bit width and fractional bit width, respectively.

通过表1可以将基于自动混合精度量化的AMP检测器和统一量化AMP检测器的平均整数和小数位宽进行比较。Table 1 compares the average integer and fractional bit widths of the automatic mixed-precision quantization based AMP detector and the unified quantization AMP detector.

表1Table 1

通过上述分析可知，本实施例的自动混合精度量化方法，由于自动实现MIMO检测器中不同中间变量的量化位宽的分配，避免了大量的量化位宽冗余。It can be seen from the above analysis that the automatic mixed-precision quantization method of this embodiment avoids a large amount of quantization bit width redundancy due to the automatic realization of the allocation of quantization bit widths of different intermediate variables in the MIMO detector.

下面对本发明提供的自动混合精度量化装置进行描述，下文描述的自动混合精度量化装置与上文描述的自动混合精度量化方法可相互对应参照。The automatic mixed-precision quantization device provided by the present invention is described below, and the automatic mixed-precision quantization device described below and the automatic mixed-precision quantization method described above can be referred to each other correspondingly.

请参照图8，图8是本发明提供的自动混合精度量化装置的结构示意图。如图8所示，本发明提供的自动混合精度量化装置可以包括：Please refer to FIG. 8 , which is a schematic structural diagram of an automatic mixed-precision quantization apparatus provided by the present invention. As shown in Figure 8, the automatic mixed precision quantization device provided by the present invention may include:

变量获取模块10，用于获取MIMO检测器在对MIMO系统进行信号检测的过程中产生的待量化的中间变量；A variable acquisition module 10, configured to acquire intermediate variables to be quantified generated by the MIMO detector in the process of performing signal detection on the MIMO system;

第一量化模块20，用于通过深度强化学习算法对智能体进行训练，并基于训练好的所述智能体中存储的策略对每个所述中间变量的小数位宽进行量化；The first quantization module 20 is used to train the agent through a deep reinforcement learning algorithm, and quantify the decimal width of each of the intermediate variables based on the strategy stored in the trained agent;

第二量化模块30，用于基于概率密度函数，对每个所述中间变量的整数位宽进行量化。The second quantization module 30 is configured to quantize the integer bit width of each of the intermediate variables based on the probability density function.

可选地，第一量化模块20包括：Optionally, the first quantization module 20 includes:

交互单元，用于在每一回合中，初始化环境，基于马尔可夫决策过程将智能体与环境进行交互，并存储交互数据；The interaction unit is used for initializing the environment in each round, interacting the agent with the environment based on the Markov decision process, and storing the interaction data;

更新单元，用于每达到预设回合次数，控制所述智能体基于当前存储的所述交互数据更新所述智能体中存储的策略，直至达到最大回合次数。The updating unit is configured to control the agent to update the strategy stored in the agent based on the currently stored interaction data every time a preset number of rounds is reached, until the maximum number of rounds is reached.

可选地，交互单元具体用于：Optionally, the interaction unit is specifically used for:

统计分布单元，用于通过蒙特卡罗仿真算法将训练好的所述智能体中存储的策略转化为每个所述中间变量的小数位宽的统计分布；A statistical distribution unit, used for converting the strategy stored in the trained agent into a statistical distribution of the decimal width of each intermediate variable through a Monte Carlo simulation algorithm;

小数位宽量化单元，用于基于所述中间变量的小数位宽的统计分布，对所述中间变量的小数位宽进行量化；a decimal width quantization unit, configured to quantify the decimal width of the intermediate variable based on the statistical distribution of the decimal width of the intermediate variable;

可选地，统计分布单元具体用于：Optionally, the statistical distribution unit is specifically used for:

可选地，小数位宽量化单元具体用于：Optionally, the fractional width quantization unit is specifically used for:

基于所述中间变量的小数位宽的统计分布，将所述中间变量的小数位宽的统计分布的均值确定为所述中间变量的小数位宽。Based on the statistical distribution of the decimal width of the intermediate variable, the mean value of the statistical distribution of the decimal width of the intermediate variable is determined as the decimal width of the intermediate variable.

可选地，第二量化模块30具体用于：Optionally, the second quantization module 30 is specifically used for:

图9示例了一种电子设备的实体结构示意图，如图8所示，该电子设备可以包括：处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840，其中，处理器810，通信接口820，存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令，以执行自动混合精度量化方法，该方法包括：FIG. 9 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 8 , the electronic device may include: a processor (processor) 810, a communication interface (Communications Interface) 820, a memory (memory) 830, and a communication bus 840, The processor 810 , the communication interface 820 , and the memory 830 communicate with each other through the communication bus 840 . The processor 810 may invoke logic instructions in the memory 830 to perform an automatic mixed-precision quantization method including:

此外，上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的自动混合精度量化方法，该方法包括：In another aspect, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can Execute the automatic mixed-precision quantization method provided by the above methods, the method includes:

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的自动混合精度量化方法，该方法包括：In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, is implemented to execute the automatic mixed-precision quantization method provided by the above methods, the method include:

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An automatic blending precision quantization method, comprising:

acquiring intermediate variables to be quantized, which are generated by a multi-input multi-output MIMO detector in the process of signal detection of an MIMO system;

training the agent through a deep reinforcement learning algorithm, and quantizing the decimal bit width of each intermediate variable based on a strategy stored in the trained agent;

and quantizing the integer bit width of each intermediate variable.

2. The method of claim 1, wherein training an agent through a deep reinforcement learning algorithm comprises:

in each round, initializing an environment, interacting the intelligent agent with the environment based on a Markov decision process, and storing interaction data;

and controlling the intelligent agent to update the strategy stored in the intelligent agent based on the currently stored interaction data until the maximum number of rounds is reached.

3. The method of claim 2, wherein the initializing the environment comprises:

randomly extracting a plurality of intermediate variables;

initializing the decimal bit width of all intermediate variables to be a preset maximum decimal bit width, and initializing the integer bit width of all intermediate variables to be a preset maximum integer bit width;

determining a plurality of states, wherein the plurality of states comprise serial numbers and decimal bit widths corresponding to the plurality of intermediate variables;

randomly selecting and returning a state from the plurality of states.

4. The method of claim 3, wherein the Markov decision-based process of interacting an agent with an environment and storing interaction data comprises:

executing the following steps at each moment until the current moment reaches the maximum moment, and storing interaction data generated at each moment, wherein the interaction data comprises a state, an action value and a reward value:

determining an action value according to the current state and the current strategy; wherein the action value is used for representing the variation of the decimal bit width;

modifying the current quantization scheme according to the action value to obtain a modified quantization scheme;

and evaluating the modified quantization scheme according to a reward function to obtain a reward value, and simultaneously selecting and returning to the next state.

5. The MIMO detector-oriented automatic hybrid-precision quantization method according to any one of claims 1 to 4, wherein the quantizing the decimal bit width of each of the intermediate variables based on the trained stored policy in the agent comprises:

converting the trained strategies stored in the intelligent agent into the statistical distribution of the decimal bit width of each intermediate variable through a Monte Carlo simulation algorithm;

quantifying the decimal bit width of the intermediate variable based on the statistical distribution of the decimal bit width of the intermediate variable;

the strategy is used for representing a mapping relation between a state and an action value, the state comprises a sequence number corresponding to an intermediate variable and a decimal bit width, and the action value is used for representing a variation of the decimal bit width.

6. The method of claim 5, wherein the transforming the trained stored strategies in the agent into a fractional bit wide statistical distribution of each intermediate variable by a Monte Carlo simulation algorithm comprises:

executing the following steps during each test until the maximum test times are reached, and obtaining the frequency of each intermediate variable measuring different decimal bit widths so as to obtain the statistical distribution of the decimal bit widths of each intermediate variable:

for each intermediate variable, determining an action value based on the current state of the intermediate variable and a strategy stored in the trained agent;

altering a decimal bit width of the intermediate variable based on the action value.

7. The MIMO detector-oriented automatic hybrid precision quantization method of claim 5, wherein the quantizing the fractional bit width of the intermediate variable based on the statistical distribution of the fractional bit width of the intermediate variable comprises:

and determining the average value of the statistical distribution of the decimal bit width of the intermediate variable as the decimal bit width of the intermediate variable based on the statistical distribution of the decimal bit width of the intermediate variable.

8. The MIMO detector-oriented automatic hybrid precision quantization method of claim 1, wherein the quantizing the integer bit width of each of the intermediate variables comprises:

generating a plurality of data of each intermediate variable through a Monte Carlo simulation algorithm to obtain a first data set;

extracting data in the first data set which is not in the amplitude range of the quantization scheme to obtain a second data set; the amplitude of the quantization scheme is the difference of an expression which takes 2 as a base number and takes an integer bit width as an argument and an expression which takes 2 as a base number and takes a negative decimal bit width as an argument;

determining the minimum integer bit width meeting the preset condition as the integer bit width of the intermediate variable; wherein the preset conditions are as follows: the ratio of the number of data in the second data set to the number of data in the first data set is less than or equal to a preset threshold.

9. An automatic blending precision quantization apparatus, comprising:

the variable acquisition module is used for acquiring intermediate variables to be quantized, which are generated by the MIMO detector in the process of signal detection of the MIMO system;

the first quantization module is used for training the agent through a deep reinforcement learning algorithm and quantizing the decimal bit width of each intermediate variable based on a strategy stored in the trained agent;

and the second quantization module is used for quantizing the integer bit width of each intermediate variable based on a probability density function.

10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the automatic blending accuracy quantification method of any one of claims 1 to 8.

11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the automatic blending precision quantization method of any of claims 1 to 8.