KR101965850B1

KR101965850B1 - A stochastic implementation method of an activation function for an artificial neural network and a system including the same

Info

Publication number: KR101965850B1
Application number: KR1020170053888A
Authority: KR
Inventors: 여인준; 이병근; 기상균
Original assignee: 광주과학기술원
Priority date: 2017-04-26
Filing date: 2017-04-26
Publication date: 2019-04-05
Also published as: KR20180120009A

Abstract

본 발명은 인공 신경망을 구성하는 활성 함수 구현에 관한 것이다. 구체적으로 본 발명은 인공 신경망을 구성하는 비교기 입력을 조정하여 4가지 서로 다른 타입의 활성 함수를 구현할 수 있다.The present invention relates to the implementation of an active function that constitutes an artificial neural network. Specifically, the present invention can implement four different types of active functions by adjusting the comparator inputs that make up the artificial neural network.

Description

[0001] The present invention relates to a stochastic implementation method of an active function used in an artificial neural network, and a system including the stochastic implementation method,

본 발명은 인공 신경망에 사용되는 활성 함수의 구현 방법 및 활성 함수를 포함하는 인식 시스템에 관한 것이다.The present invention relates to a recognition system including an implementation method and an activation function of an activation function used in an artificial neural network.

인공 신경망(Artificial Neural Network; ANN)은 생물학의 신경망과 유사한 방식으로 데이터를 처리하는 통계학적인 망이다. 인공 신경망은 문자 인식, 이미지 인식, 음성 인식, 얼굴 인식과 같은 다양한 분야에서 사용될 수 있다. 인공 신경망은 초고밀도집적회로(Very Large Scale Integrated circuit; VLSI)에서 구현될 수 있다. 생물학의 신경망과 유사하게 인공 신경망은 시냅스 및 인공 뉴런을 포함할 수 있다.Artificial Neural Network (ANN) is a statistical network that processes data in a manner similar to biological neural networks. Artificial neural networks can be used in various fields such as character recognition, image recognition, speech recognition, and face recognition. An artificial neural network can be implemented in a very large scale integrated circuit (VLSI). Similar to biological neural networks, artificial neural networks can include synapses and artificial neurons.

기술의 발전에 따라, 인공 신경망의 복잡도는 점점 증가할 수 있다. 구체적으로 인공 신경망의 시냅스들의 개수 및 인공 뉴런들의 개수가 증가할 수 있다. 이로 인해, 하드웨어로 구현된 인공 신경망의 면적 및 전력 소모가 증가할 수 있는 문제점이 있다.With the development of technology, the complexity of artificial neural networks can increase. Specifically, the number of synapses in the artificial neural network and the number of artificial neurons can be increased. As a result, there is a problem that the area and power consumption of the artificial neural network implemented by hardware increase.

[1] K. Leboeuf, R. Muscedere, and M. Ahmadi, "Performance analysis of table-based approximations of the hyperbolic tangent activation function," in 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), 2011, pp. 1-4.[1] K. Leboeuf, R. Muscedere, and M. Ahmadi, "Performance analysis of table-based approximations of the hyperbolic tangent activation function," in IEEE IEEE International Conference on Computing and Communications, . 1-4. [2] A. H. Namin, K. Leboeuf, R. Muscedere, H. Wu, and M. Ahmadi, "Efficient hardware implementation of the hyperbolic tangent sigmoid function," in 2009 IEEE International Symposium on Circuits and Systems, 2009, pp. 2117-2120.[2] A. H. Namin, K. Leboeuf, R. Muscedere, H. Wu, and M. Ahmadi, "Efficient hardware implementation of the hyperbolic tangent sigmoid function," 2009 IEEE International Symposium on Circuits and Systems, 2009, pp. 2117-2120. [3] Y. Ji, F. Ran, C. Ma, and D. J. Lilja, "A hardware implementation of a radial basis function neural network using stochastic logic," in 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp. 880-883.[3] Y. Ji, F. Ran, C. Ma, and DJ Lilja, "A Hardware Implementation of a Radial Basis Function Neural Network Using Stochastic Logic, , 2015, pp. 880-883. [4] M. Martincigh and A. Abramo, "A new architecture for digital stochastic pulse-mode neurons based on the voting circuit," IEEE Transactions on Neural Networks, vol. 16, pp. 1685-1693, 2005.[4] M. Martincigh and A. Abramo, "A new architecture for digital stochastic pulse-mode neurons based on the voting circuit," IEEE Transactions on Neural Networks, vol. 16, pp. 1685-1693, 2005. [5] H. Li, B. Liu, X. Liu, M. Mao, Y. Chen, Q. Wu, et al., "The applications of memristor devices in next-generation cortical processor designs," in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), 2015, pp. 17-20.[5] H. Li, B. Liu, X. Liu, M. Mao, Y. Chen, Q. Wu, et al., "The applications of memristor devices in next-generation cortical processor designs, Symposium on Circuits and Systems (ISCAS), 2015, pp. 17-20. [6] F. Tenore, R. J. Vogelstein, R. Etienne-Cummings, G. Cauwenberghs, and P. Hasler, "A floating-gate programmable array of silicon neurons for central pattern generating networks," in 2006 IEEE International Symposium on Circuits and Systems, 2006, pp. 4 pp.-3160.[6] F. Tenore, RJ Vogelstein, R. Etienne-Cummings, G. Cauwenberghs, and P. Hasler, "A floating-gate programmable array of silicon neurons for central pattern generating networks," 2006 IEEE International Symposium on Circuits and Systems, 2006, pp. 4 pp. 3160. [7] M. S. J. Tomlinson, D. J. Walker, and M. A. Sivilotti, "A digital neural network architecture for VLSI," in Neural Networks, 1990 IJCNN International Joint Conference on, 1990, pp. 545-550.[7] M. S. J. Tomlinson, D. J. Walker, and M. A. Sivilotti, "A Digital Neural Network Architecture for VLSI," in Neural Networks, 1990 IJCNN International Joint Conference on, 1990, pp. 545-550. [8] S. Haykin, Neural Networks and Learning Machines (3rd Edition), Prentice Hall, 2009.[8] S. Haykin, Neural Networks and Learning Machines (3rd Edition), Prentice Hall, 2009. [9] P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. V. d. Plas, "Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 55, pp. 1441-1454, 2008.[9] P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. V. d. Plas, " Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures, " IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 55, pp. 1441-1454, 2008. [10] Y. LeCun and C. Cortes, "The MNIST database of handwritten digits," ed, 1998.[10] Y. LeCun and C. Cortes, "The MNIST database of handwritten digits," ed, 1998.

본 발명은 기존의 확률적 컴퓨팅 방법을 통한 활성 함수 구현에 있어서 아날로그-디지털 컨버터 없이도 활성 함수를 구현할 수 있는 방법을 제공할 수 있다.The present invention can provide a method for implementing an active function without an analog-digital converter in the implementation of an active function through a conventional stochastic computing method.

시스템 클록을 생성하는 클록 생성기, 아날로그 랜덤 신호를 생성하는 아날로그 랜덤 노이즈 생성기, 및 아날로그 랜덤 노이즈 및 뉴런 가중치를 입력받아 비교를 수행하는 비교기를 포함하고, 상기 비교기는 활성 함수를 생성하고, 생성된 활성함수를 통해 비교를 수행하고, 상기 활성 함수는 스텝 함수, 시그모이드 함수, 아이덴티티 함수 또는 ReLU 함수 중 적어도 어느 하나인 인식 시스템A clock generator for generating a system clock, an analog random noise generator for generating an analog random signal, and a comparator for receiving analog random noise and neuron weights for performing a comparison, said comparator generating an active function, Function, and wherein the activation function is at least one of a step function, a sigmoid function, an identity function, or a ReLU function,

도 1은 본 발명의 일 실시 예에 따른 인공 신경망을 예시적으로 나타낸다.
도 2는 도 1에 도시된 인공 뉴런을 예시적으로 보여주는 블록도이다.
도 3은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅을 위한 아날로그-디지털 컨버터를 나타낸다.
도 4는 분포에 의존하는 평균 출력 스파이크의 이론적 커브를 나타낸다.
도 5는 본 발명의 일 실시 예에 따른 아날로그 랜덤 신호 발생기의 구성을 나타낸다.
도 6은 각각의 경우(M=8, 기준 전압 = 1.5V)에 대한 DAC 출력 분포를 나타낸다.
도 7은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅 활성 함수 어레이를 포함하는 모듈을 나타낸다.
도 8은 확률적 컴퓨팅에 기반한 활성 함수의 네가지 다른 타입에 대한 시뮬레이션 결과를 나타낸다.
도 9는 본 발명의 일 실시 예에 따른 래치 비교기의 입력 단자를 나타낸다.
도 10 및 도 11은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅 활성 함수 어레이의 성능을 평가하기 위한 테스트 결과를 나타낸다.
도 12는 시그모이드 활성 함수를 통한 인식에 있어서, DAC 미스매치에 대한 영향을 나타낸다.1 illustrates an example of an artificial neural network according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating an exemplary artificial neuron shown in FIG. 1. FIG.
3 illustrates an analog-to-digital converter for stochastic computing in accordance with one embodiment of the present invention.
Figure 4 shows a theoretical curve of average output spikes depending on the distribution.
5 shows a configuration of an analog random signal generator according to an embodiment of the present invention.
6 shows the DAC output distribution for each case (M = 8, reference voltage = 1.5V).
Figure 7 shows a module comprising a probabilistic computing active function array according to an embodiment of the present invention.
Figure 8 shows simulation results for four different types of activation functions based on stochastic computing.
9 shows an input terminal of a latch comparator according to an embodiment of the present invention.
10 and 11 illustrate test results for evaluating the performance of probabilistic computing active function arrays in accordance with an embodiment of the present invention.
Figure 12 shows the effect on DAC mismatch in recognition through the sigmoid activation function.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals are used to designate identical or similar elements, and redundant description thereof will be omitted. The suffix " module " and " part " for the components used in the following description are given or mixed in consideration of ease of specification, and do not have their own meaning or role. In the following description of the embodiments of the present invention, a detailed description of related arts will be omitted when it is determined that the gist of the embodiments disclosed herein may be blurred. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. , &Lt; / RTI > equivalents, and alternatives.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements are not limited to these terms. The terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The singular expressions include plural expressions unless the context clearly dictates otherwise.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In the present application, the terms "comprises", "having", and the like are used to specify that a feature, a number, a step, an operation, an element, a component, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

도 1은 본 발명의 일 실시 예에 따른 인공 신경망을 예시적으로 나타낸다.1 illustrates an example of an artificial neural network according to an embodiment of the present invention.

인경 신경망(Artificial Neural Network; ANN, 10)은 생물학의 신경망과 유사한 인공적으로 구성된 망을 의미한다. 도 1을 참조하면, 인공 신경망(10)은 입력층(input layer, 11), 은닉층(hidden layer, 12) 및 출력층(output layer, 13)을 포함할 수 있다.An artificial neural network (ANN, 10) is an artificially constructed network similar to a biological neural network. Referring to FIG. 1, an artificial neural network 10 may include an input layer 11, a hidden layer 12, and an output layer 13.

입력층(11), 은닉층(12) 및 출력층(13)은 시냅스를 통해 서로 연결될 수 있다. 생물학적으로 시냅스는 신경세포와 다른 신경 세포 사이의 접합부를 의미한다. 유사하게 인공 신경망(10)에서 인공 뉴런과 다른 인공 뉴런은 시냅스를 통해 서로 결합될 수 있다.The input layer 11, the hidden layer 12, and the output layer 13 may be connected to each other via a synapse. Biologically, synapses represent junctions between nerve cells and other nerve cells. Similarly, artificial neurons and other artificial neurons in the artificial neural network 10 can be coupled to each other via synapses.

인공 신경망(10)은 복수의 인공 뉴런(100)을 포함할 수 있다. 복수의 인공 뉴런(100)은 외부로부터 입력 데이터(X₁ ~ X_I)를 수신하는 입력 뉴런, 입력 뉴런으로부터 데이터를 수신하고 이를 처리하는 은닉 뉴런, 그리고 은닉 뉴런으로부터 데이터를 수신하고 출력 데이터(Y_i ~ Y_J)를 생성하는 출력 뉴런을 포함할 수 있다.The artificial neural network 10 may include a plurality of artificial neurons 100. The plurality of artificial neurons 100 receive input data from outside (X ₁ to X _I ), input data from an input neuron and concealed neurons that process and process the data, and output data (Y _i to Y _J ).

입력층(11)은 복수의 입력 뉴런을 포함할 수 있고, 은닉층(12)은 복수의 은닉 뉴런을 포함할 수 있고, 출력층(13)은 복수의 출력 뉴런을 포함할 수 있다. 여기서, 입력층(11), 은닉층(12), 길고 출력층(13) 각각에 포함된 인공 뉴런들의 개수는 도시된 바에 한정되지 않는다. 은닉층(12)은 도 1에서 도시된 것보다 더 많은 층들을 포함할 수 있고 상술한 층들의 개수는 인공 신경망(10)의 정확도나 학습 속도와 관련된다.The input layer 11 may comprise a plurality of input neurons, the hidden layer 12 may comprise a plurality of concealed neurons, and the output layer 13 may comprise a plurality of output neurons. Here, the number of artificial neurons included in each of the input layer 11, the hidden layer 12, the long output layer 13, and the like is not limited to the illustrated one. The hidden layer 12 may contain more layers than those shown in Figure 1 and the number of layers described above is related to the accuracy or learning rate of the artificial neural network 10. [

예를 들어, 입력 데이터는 사람의 얼굴 또는 필기체를 나타내는 이미지 데이터일 수 있다. 인공 신경망(10)은 이미지 데이터를 인식하고, 이미지가 무엇인지를 출력할 수 있다. For example, the input data may be image data representing a face or a handwriting of a person. The artificial neural network 10 recognizes the image data and can output what the image is.

도 2는 도 1에 도시된 인공 뉴런을 예시적으로 보여주는 블록도이다.FIG. 2 is a block diagram illustrating an exemplary artificial neuron shown in FIG. 1. FIG.

도 2를 참조하면, 인공 뉴런(100)은 합산회로(summation circuit, 110) 및 활성 함수 회로(Activation function circuit, 120)를 포함할 수 있다.Referring to FIG. 2, the artificial neuron 100 may include a summation circuit 110 and an activation function circuit 120.

합산 회로(110)는 가중치들(W₁~ W_K)을 이용하여 입력 신호들(A₁ ~ A_K)을 합산할 수 있다. 입력 신호들 각각은 임의의 인공 뉴런으로부터 생성된 출력신호를 나타낼 수 있다. 가중치들 각각의 시냅스의 강도(즉, 인공 뉴런과 다른 인공 뉴런과의 결합정도)를 나타낼 수 있다. 구체적으로 합산회로(110)는 가중치들과 입력 신호들을 각각 곱한 후, 곱셈 결과들을 합쳐 합산 괄과(B)를 생성할 수 있다. 따라서, 가중치가 낮으면 해당 입력 신호는 합산 결과(B)에서 비중이 낮아지고, 가중치가 높으면 해당 입력 신호는 합산 결과(B)에서 비중이 높아진다. 합산 결과는 수학식 1로 나타낼 수 있다.The summing circuit 110 may sum up the input signals A ₁ to A _K using the weights W ₁ to W _K. Each of the input signals may represent an output signal generated from any artificial neuron. And the intensity of each synapse (i.e., the degree of coupling between artificial neurons and other artificial neurons). Specifically, the summation circuit 110 may multiply the weights and the input signals, respectively, and then combine the multiplication results to generate a summation (B). Accordingly, when the weight is low, the weight of the input signal is low in the summation result B, and when the weight is high, the weight of the input signal is high in the summation result B. The summed result can be expressed by Equation (1).

활성 함수 회로(120)는 합산 결과 및 활성 함수(Activation function; AF)를 이용해 활성 결과(C)를 출력할 수 있다. 활성 결과는 수학식 2로 나타낼 수 있다.The activation function circuit 120 may output the activation result (C) using the summing result and the activation function (AF). The activation result can be expressed by Equation (2).

도 3은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅을 위한 아날로그-디지털 컨버터를 나타낸다.3 illustrates an analog-to-digital converter for stochastic computing in accordance with one embodiment of the present invention.

도 4는 분포에 의존하는 평균 출력 스파이크의 이론적 커브를 나타낸다.Figure 4 shows a theoretical curve of average output spikes depending on the distribution.

인경 신경망(Artificial Neural Networks, ANN)의 핵심 요소 중 하나는 활성 함수(Activation Function, AF)이다. 활성 함수는 뉴런 입력의 가중치 합을 파이어링 레이트(firing rate)의 확률로 변환한다. 활성 함수의 하드웨어 구현을 위해서는 복잡한 회로가 필요하고, 상당한 전력 소모를 필요로 한다. 이는 단일 칩 상에 다수의 뉴런 통합을 어렵게 하는 점이다.One of the key elements of Artificial Neural Networks (ANN) is the Activation Function (AF). The activation function converts the weighted sum of neuron inputs into a probability of a firing rate. A hardware implementation of the active function requires complex circuitry and requires significant power consumption. This makes it difficult to integrate multiple neurons on a single chip.

이하에서는 확률론적 컴퓨팅(stochastic computing)을 기반으로 하는 활성 함수의 네가지 서로 다른 유형(step, identity, rectified-linear(ReLU), 및 sigmoid)의 인식을 위한 회로 기술을 설명한다. 본 발명의 일 실시 예에 따른 활성 함수 회로는 기존의 것보다 훨씬 간단하며 더 적은 전력을 소모한다.The following describes a circuit technique for recognizing four different types of activation functions based on stochastic computing (step, identity, rectified-linear (ReLU), and sigmoid). The active function circuit according to an embodiment of the present invention is much simpler than the conventional one and consumes less power.

인공 신경망에서 포스트시냅틱 포텐셜 파이어링 레이트(postsynaptic potential firing rate)의 확률은 활성 함수에 의해 모델링 된다. 다양한 활성 함수를 사용할 수 있는데, 일반적으로 널리 쓰이는 함수는 수학식 3로 표현되는 시그모이드(sigmoid)함수이다.The probability of postsynaptic potential firing rate in an artificial neural network is modeled by the activation function. A variety of activation functions can be used. A commonly used function is a sigmoid function expressed by Equation (3).

여기에서, G는 커브의 경사도를 결정하는 게인 팩터이다.Here, G is a gain factor that determines the slope of the curve.

활성 함수는 근사화(approximation) 및 확률적 컴퓨팅(stochastic computing)(선행기술문헌 3 내지 4 참조)의 두가지 방식으로 구현될 수 잇다.The activation function can be implemented in two ways: an approximation and stochastic computing (see prior art documents 3 to 4).

근사화 방법에서는 룩업 테이블(선행기술문헌 1 참조) 및 피스와이즈 선형 근사화(piecewise linear approximations; PLAs)(선행기술문헌 2 참조)가 일반적으로 사용된다. 정확한 근사를 위해서는 PLA에 충분한 수의 선형 세그먼트가 필요하다. 따라서, 정확한 근사를 위해서는 하드웨어가 복잡해지는 문제가 있다. In the approximation method, a look-up table (see prior art document 1) and piecewise linear approximations (PLAs) (see prior art document 2) are generally used. For accurate approximation, a sufficient number of linear segments is required for the PLA. Therefore, there is a problem that the hardware becomes complicated for accurate approximation.

또한, 활성 함수의 고정된 형태는 시냅스 가소성의 감소를 초래할 수 있다. 반면에, 확률적 컴퓨팅 방법은 비교적 간단한 디지털 논리를 통해 구현될 수 있다. 확률적 컴퓨팅 방법은 하드웨어 복잡성 측면에서 근사화 방법에 비해 장점을 가지고 있다.In addition, a fixed form of the activation function may result in a decrease in synaptic plasticity. On the other hand, probabilistic computing methods can be implemented with relatively simple digital logic. The probabilistic computing method has advantages over the approximation method in hardware complexity.

기존의 확률적 컴퓨팅 방법은 디지털 영역에서의 작업에 집중되어 있다. 확률적 컴퓨팅 방법이 근사화 방법에 비해 공간적으로 더 효율적이나, 디지털 영역에서 구현은 비교적 단순한 아날로그 구현에 비해 전력 및 면적을 더 많이 소모한다. 특히, 곱셈에서는 각 시냅스 뉴런의 활동을 평가하기 위해 복수의 부동 소수점(floating point)이 필요하다. 이 경우, 시냅스 수가 증가하면서 필요한 부동 소수점이 기하 급수적으로 증가함에 따라 소모되는 전력 및 면적에 관한 문제가 더욱 악화된다.Conventional probabilistic computing methods are concentrated on work in the digital domain. While probabilistic computing methods are more spatially efficient than approximation methods, implementations in the digital domain consume more power and area than comparatively simple analog implementations. In particular, multiplication requires multiple floating points to evaluate the activity of each synapse neuron. In this case, as the number of synapses increases, the number of floating points required increases exponentially, thereby worsening the problem of power consumption and area.

또한, 메모리 용량이 큰 memristors(선행기술문헌 5 참조) 및 floating-gats(선행기술문헌 6 참조)와 같은 아날로그 시냅스 장치로 인공 신경망을 구성할 때 아날로그 저장 상태를 읽기 위한 추가 아날로그-디지털 변환기(ADC)(선행기술문헌 7 참조)가 필요한 문제가 있다.Further, additional analog-to-digital converters (ADCs) for reading the analog storage state when constructing an artificial neural network with analog synapse devices such as memristors (see prior art document 5) and floating-gats (see prior art document 6) ) (See prior art document 7).

본 발명은 아날로그 전압을 감지하기 위한 하나의 부동 소수점을 갖는 활성 함수의 VLSI 구현을 위한 간단한 아키텍쳐를 제안한다. 제안된 아키텍쳐는 가우시안 랜덤 밸류 생성을 위한 중심 극한 정리를 사용하고, 하나의 회생 클록 비교기(regenerative clocked comparator)만을 사용하는 활성 함수를 생성하기 위한 확률적 컴퓨팅을 수행한다.The present invention proposes a simple architecture for VLSI implementation of one floating point active function for sensing analog voltage. The proposed architecture uses the central limit theorem for Gaussian random value generation and performs stochastic computing to generate an active function using only one regenerative clocked comparator.

본 발명의 일 실시 에에 따른 확률적 컴퓨팅은 입력 중 하나가 아날로그 랜덤 노이즈이며, 출력이 펄스 레이트(주파수) 모듈레이션으로 표현되는 디지털 비트 스트림인 회생 클록 비교기를 확률 법칙에 적용한 결과이다. 평균 펄스 레이트는 파이어링 레이트의 확률을 나타내며, 이 경우 활성 함수다. 본 발명의 일 실시 예에 따른 확률적 컴퓨팅의 구성은 아날로그-펄스 변환기(A2P) 및 아날로그 랜덤 노이즈 제너레이터(RNG)를 포함한다.Probabilistic computing according to an embodiment of the present invention is a result of applying a regenerative clock comparator, which is a digital bit stream whose output is represented by pulse rate (frequency) modulation, to a probability rule, one of which is an analog random noise. The average pulse rate represents the probability of a firing rate, in this case an active function. The configuration of probabilistic computing according to an embodiment of the present invention includes an analog-to-pulse converter (A2P) and an analog random noise generator (RNG).

일반적으로 비교기(comparator)는 인공 신경망에서 활성 함수에 대한 하드 임계 값(step)을 구현하기 위해 사용된다. 도 3에 도시된 바와 같이, 회생 클록 비교기의 입력 단자에 랜덤 노이즈가 입력될 수 있다. 이때 노이즈(V_n)가 제로 평균과 표준 편차σ_n을 갖는 등가 가우시안 랜덤 변수로 설명되는 경우 비교기 출력 스파이크(spike)가 "HIGH(논리 1)"가 될 확률은 수학식 4 및 수학식 5으로 표현된다.In general, a comparator is used to implement a hard threshold for an active function in an artificial neural network. As shown in FIG. 3, random noise may be input to the input terminal of the regenerative clock comparator. The probability that the comparator output spike will be " HIGH (logic 1) " when the noise V _n is described as an equivalent Gaussian random variable with zero mean and standard deviation? _N is given by Equations 4 and 5 Is expressed.

v_j를 j번째 뉴런의 출력 활동도(output activities)라고 한다면, w_ij는 j번째 뉴런부터 i번째 까지의 연결의 가중치이고, z_i는 i번째 뉴런의 가중치 합이다. 동일하게 주어진 z_i및 독립적인 노이즈에 대하여 N 번의 시도가 있는 경우, 디지털 출력 스파이크는 이항 분포 B(N,P)에 의해 결정된다. 평균(ㅅ_s) 및 표준 편차(σ_s)를 갖는 스파이크의 평균 레이트(S)는 수학식 6 및 수학식 7와 같이 표현된다.If v _j is the output activity of the jth neuron, w _ij is the weight of the connection from jth neuron to i th, and z _i is the weighted sum of the i th neuron. If there are N attempts for equally given z _i and independent noise, then the digital output spike is determined by the binomial distribution B (N, P). The average rate S of spikes with mean ( _s ) and standard deviation ( _s ) is expressed as in Equations (6) and (7).

도 4에 도시된 바와 같이 비교기 출력의 평균 레이트는 z_i에 의존한다. S의 경사도는 σ_n이 증가하는 동안 단조(monotonically) 감소한다. 반면에, σ_n가 매우 작은 경우, 도 4에 도시된 함수는 스텝(step) 함수 형태를 갖는다.As shown in FIG. 4, the average rate of the comparator output depends on z _i . The slope of S monotonically decreases while σ _n increases. On the other hand, if? _N is very small, the function shown in Fig. 4 has a step function form.

한편, 랜덤 신호(v_n)이 제로 평균을 갖는 등가 유니폼 랜덤 변수인 경우, 비교기 출력이 "HIGH"가 될 확률은 수학식 8과 같다.On the other hand, when the random signal v _n is an equivalent uniform random variable having a zero average, the probability that the comparator output becomes " HIGH "

주어진 z_i에 대하여 복수회의 비교를 반복 수신하면, 비교기 출력의 평균 레이트(L)은 평균 수학식 6의 P_G가 P_U로 대체될 때, 각각의 평균(ㅅ_L) 및 표준 편차(σ_L)가 수학식 6 및 수학식 7와 동일한 결과를 갖는다. 마지막으로 L은 도 4에 도시된 바와 같이 identity 활성 함수에 대응된다.When receiving repeatedly a plurality of times compared to a given z _i, the comparator outputs the average rate (L) is the average time of Equation 6 P _G is replaced with P _U, respective average (oi _L) and standard deviation (σ _L of ) Have the same results as in Equations (6) and (7). Finally, L corresponds to the identity activation function as shown in FIG.

수학식 7과 같이, 실제 활성 함수의 편차는 시행 횟수에 의존한다. 적절한 정확도를 보장하기 위해 N을 늘리면 전력 소모가 증가한다. 그러나, 인공 신경망이 높은 노이즈 내성을 제공하기 때문에 이는 어느 정도 해결이 가능하다.As shown in Equation (7), the deviation of the actual activation function depends on the number of trials. Increasing N to ensure proper accuracy increases power consumption. However, this can be solved to some extent because the artificial neural network provides high noise immunity.

N 비트 선형 피드백 시프트 레지스터(N-bit linear shift register; LFSR)는 최대 2^N-1 주기의 유니폼 분포 수도 랜덤 비트스트림(uniform distribution pseudo random bitstream)을 생성한다. 이러한 과정은 중심 극한 정리에 의해 가우시안 분포로 변환된다. 중심 극한 정리에 따르면 샘플 크기가 증가함에 따라 생성된 샘플은 제로 평균 및 단위(a unit) 분산을 갖는 가우시안 분포에 근사한다. 수도 랜덤 샘플을 아날로그 전압으로 변경하기 위하여 디지털-아날로그 컨버터(DAC)가 필요하다. 디지털-아날로그 컨버터는 디지털 입력 및 아날로그 출력간에 선형 1 대 1 관계를 갖는바, 가우시안 또는 유니폼 특징이 그대로 반영된다.An N-bit linear shift register (LFSR) generates a uniform distribution pseudo random bitstream of up to 2 ^N -1 cycles. This process is transformed into the Gaussian distribution by the central limit theorem. According to the central limit theorem, as the sample size increases, the resulting sample approximates a Gaussian distribution with zero mean and a unit variance. A digital-to-analog converter (DAC) is needed to convert the water sample to an analog voltage. The digital-to-analog converter has a linear one-to-one relationship between the digital input and the analog output, and reflects the Gaussian or uniform characteristics as they are.

도 5는 본 발명의 일 실시 예에 따른 아날로그 랜덤 신호 발생기의 구성을 나타낸다. 5 shows a configuration of an analog random signal generator according to an embodiment of the present invention.

XOR 기반 LFSR에는 두 개의 N 비트 길이가 있다. 각각은 MSB와 LSB의 두 그룹으로 나뉜다. 따라서, N=2M이다. LFSR의 초기 조건은 더 나은 비상관(uncorrelated) 수도 랜덤 신호를 생성하기 위해 다르다. 마찬가지로, 데시메이션(decimation)에서 비상관을 유지하기 위해 동일한 그룹 데이터가 가산기에 입력된다. 덧셈은 간단한 풀 애더(full adder)를 통해 수행된다. 데시메이션 후, 가산기 모듈의 출력(M+1) 비트에서 잘린 LSB를 제외하고 나머지 출력은 M 비트 DAC로 공급된다. MUX는 DAC에 적용되는 임의의 비트 스트림 유형을 제어한다. An XOR-based LFSR has two N-bit lengths. Each is divided into two groups, MSB and LSB. Therefore, N = 2M. The initial conditions of the LFSR are different to produce a better uncorrelated random signal. Likewise, the same group data is input to the adder to maintain uncorrection in decimation. Addition is performed through a simple full adder. After decimation, the remaining output is supplied to the M-bit DAC except for the LSB truncated at the output (M + 1) bits of the adder module. The MUX controls any bitstream type applied to the DAC.

도 6은 각각의 경우(M=8, 기준 전압 = 1.5V)에 대한 DAC 출력 분포를 나타낸다. 6 shows the DAC output distribution for each case (M = 8, reference voltage = 1.5V).

단일의 활성 함수 구조의 동작 원리를 상술하였다. 이하에서는 하드웨어 구현에 대하여 설명한다.The operating principle of a single active function structure has been described above. Hereinafter, a hardware implementation will be described.

도 7은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅 활성 함수 어레이를 포함하는 모듈을 나타낸다.Figure 7 shows a module comprising a probabilistic computing active function array according to an embodiment of the present invention.

본 발명의 일 실시 예에 따른 확률적 컴퓨팅 활성 함수 어레이를 포함하는 모듈(400)은 도 7에 도시된 바와 같이 아날로그 랜덤 노이즈 생성기(random noise generator; RNG, 410), 32개의 래치 비교기(420), 클럭 생성기(430) 및 멀티플렉서(440)를 포함한다. The module 400 including the probabilistic computing active function array according to one embodiment of the present invention includes an analog random noise generator (RNG) 410, 32 latch comparators 420, A clock generator 430, and a multiplexer 440.

아날로그 랜덤 노이즈 생성기는 도 5에서 기 설명한 것과 같다. 구체적으로 아날로그 랜덤 노이즈 생성기는 디지털-아날로그 변환기(411) 및 선형 피드백 시프트 레지스터(LFSR, 412)를 포함한다. The analog random noise generator is the same as described in FIG. Specifically, the analog random noise generator includes a digital-to-analog converter 411 and a linear feedback shift register (LFSR) 412.

래치 비교기(420)은 도 3에서 기 설명하였다. 32개의 래치 비교기의 네거티브 입력 노드는 동일한 아날로그 RNG 출력을 공유한다. 복수의 래치 비교기가 동시 동작됨으로써 병렬 처리가 가능하다. The latch comparator 420 has been described in FIG. The negative input nodes of the 32 latch comparators share the same analog RNG output. Parallel latching can be performed by simultaneously operating a plurality of latch comparators.

클럭 생성기(430)는 시스템에서 이용하는 시스템 클록을 생성한다.The clock generator 430 generates a system clock for use in the system.

멀티플렉서(440)는 복수의 래치 비교기로부터 입력되는 신호를 먹싱하여 출력한다.The multiplexer 440 outputs the signals input from the plurality of latch comparators.

도 8은 확률적 컴퓨팅에 기반한 활성 함수의 네가지 다른 타입에 대한 시뮬레이션 결과를 나타낸다.Figure 8 shows simulation results for four different types of activation functions based on stochastic computing.

제1 실시 예에서, 스텝 타입의 활성 함수를 생성하는 경우, 래치 비교기는 아날로그 RNG의 공통 모드 전압(V_CM)과의 단일 비교를 수행한다. In the first embodiment, when generating a step-type active function, the latch comparator performs a single comparison with the common mode voltage (V _CM ) of the analog RNG.

제2 실시 예에서, 시그모이드 타입의 활성 함수를 생성하는 경우, 래치 비교기는 가우스 분포 노이즈와 다중 비교를 수행한다. In the second embodiment, when generating a sigmoid-type active function, the latch comparator performs multiple comparisons with Gaussian noise.

제3 실시 예에서, 아이덴티티(identity) 타입의 활성 함수를 생성하는 경우, 균일한 분포 잡음으로 가우시안 분포 잡음을 대체한 뒤, 시그모이드 타입의 절차를 따른다. In the third embodiment, when generating an identity function of an identity type, Gaussian distribution noise is replaced with uniform distribution noise, followed by a sigmoid type procedure.

제4 실시 예에서, ReLU 타입의 활성 함수를 생성하는 경우 DAC의 레퍼런스 참조 전압이 0에서 V_CM으로 증가한다. In the fourth embodiment, when the ReLU type active function is generated, the reference voltage of the DAC increases from 0 to V _CM .

도 9는 본 발명의 일 실시 예에 따른 래치 비교기의 입력 단자를 나타낸다.9 shows an input terminal of a latch comparator according to an embodiment of the present invention.

래치 비교기는 확률적 컴퓨팅을 위한 핵심 구성 요소이다. 래치 비교기의 입력 단자는 일반적으로 NMOS 쌍 또는 PMOS 쌍으로 구성된다. 단일 종단형(single-ended) 회로의 경우, 트랜지스터의 임계 값 때문에 입력 전압 범위에 제한이 있다. 이러한 문제를 해결하기 위해 도 5에 도시된 바와 같이, 상보적인 트랜지스터가 입력 단자로 사용된다. 결과적으로 입력 신호가 지나치게 낮거나 높을지라도 PMOS 또는 NMOS가 작동한다. 트랜지스터의 크기는 비교기의 고유한 열 잡음을 최소화한 결과이다. 열잡음은 가우스 분포를 따르기 때문에 선형 활성 함수(아이덴티티, ReLU) 변환에 변형이 생길 수 있다. 트랜지스터 폭을 증가 시키면 입력 레퍼런스 노이즈를 감소시킬 수 있다. Latch comparators are a key component for probabilistic computing. The input terminal of the latch comparator is generally comprised of an NMOS pair or a PMOS pair. For single-ended circuits, the input voltage range is limited due to the threshold of the transistor. To solve this problem, as shown in Fig. 5, a complementary transistor is used as an input terminal. As a result, the PMOS or NMOS operates even if the input signal is too low or high. The size of the transistor is the result of minimizing the inherent thermal noise of the comparator. Since thermal noise follows the Gaussian distribution, transformations in the linear activation function (identity, ReLU) can occur. Increasing the transistor width can reduce the input reference noise.

도 10 및 도 11은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅 활성 함수 어레이의 성능을 평가하기 위한 테스트 결과를 나타낸다.10 and 11 illustrate test results for evaluating the performance of probabilistic computing active function arrays in accordance with an embodiment of the present invention.

도 10은 본 발명의 일 실시 예에 따른 확률적 컴퓨팅 네트워크를 나타낸다.10 illustrates a stochastic computing network according to an embodiment of the present invention.

도 10에 도시된 바와 같이, 테스트를 위해 MNIST 교육 테스트(선행기술문헌 10)가 사용되었다. 이 샘플은 8 비트 그레이 스케일과 28 x 28 픽셀의 수기 자릿수(0-9)로 구성된다. 도 10에 도시된 확률적 컴퓨팅 네트워크는 가시 영역에서 800개의 뉴런(25개 모듈), 숨겨진 레이어에는 320개(10개 모듈), 출력 레이어에는 32개(1개 모듈)의 뉴런이 있다. MNIST 이미지의 크기는 784개이다. 그러나 본 발명의 일 실시 예에 따른 활성 함수 어레이는 32개의 단위로 구성되며 훈련에 영향을 미치지 않는 null값은 가시 영역의 나머지 16개의 뉴런에 적용된다. 또한, 출력 레이어에 있 10개의 뉴런만 사용된다. 가중치는 역 전파(back-propagation)에 기초한 종래의 슈퍼바이즈 러닝에 의해 훈련된다.As shown in Figure 10, the MNIST training test (prior art document 10) was used for testing. The sample consists of an 8-bit grayscale and a 28-by-28 pixel number of digits (0-9). The probabilistic computing network shown in FIG. 10 has 800 neurons (25 modules) in the visible region, 320 (10 modules) in the hidden layer, and 32 (1 module) neurons in the output layer. The size of the MNIST image is 784. However, the active function array according to an embodiment of the present invention is composed of 32 units, and the null value that does not affect the training is applied to the remaining 16 neurons of the visible region. Also, only 10 neurons in the output layer are used. The weights are trained by conventional super-bias runs based on back-propagation.

도 11(a)는 시그모이드 활성 함수를 사용하는 경우 시도 횟수에 따른 인식률을 나타낸다.11 (a) shows the recognition rate according to the number of attempts when the sigmoid activation function is used.

도 11(b)는 ReLU 활성 함수를 사용하는 경우 시도 횟수에 따른 인식률을 나타낸다.11 (b) shows the recognition rate according to the number of attempts when the ReLU activation function is used.

각각의 경우에 확률적 컴퓨팅 시그모이드 활성 함수 및 ReLU 활성 함수가 사용된다. 도 11(a)에 도시된 바와 같이, 시그모이드 활성 함수를 사용하는 경우 최소 32번의 비교가 수행되면 필기체 인식에 성공하는 것을 알 수 있다. 또한, 도 11(b)에 도시된 바와 같이, ReLU 활성 함수를 사용하는 경우 최소 64번의 비교가 수행되면 필기체 인식에 성공하는 것을 알 수 있다.In each case probabilistic computing sigmoid activation function and ReLU activation function are used. As shown in FIG. 11 (a), when the sigmoid activation function is used, the handwriting recognition is successful if at least 32 comparisons are performed. In addition, as shown in FIG. 11 (b), when the ReLU activation function is used, handwriting recognition is successful if at least 64 comparisons are performed.

도 12는 시그모이드 활성 함수를 통한 인식에 있어서, DAC 미스매치에 대한 영향을 나타낸다.Figure 12 shows the effect on DAC mismatch in recognition through the sigmoid activation function.

도 12는 일 예로 DAC 미스매치가 20퍼센트 발생하는 경우를 나타낸다. 프로세스의 변동에 의해 발생되는 미스매치는 일반적으로 VLSI 시스템의 성능 저하의 주요 원인으로 간주된다. 그러나, 도 12에 도시된 바와 같이 DAC 미스매치가 본 발명의 일 실시 예에 따른 확률적 컴퓨팅에 영향을 미치지 않음을 확인할 수 있다.12 shows a case where DAC mismatch occurs by 20% as an example. Mismatches caused by process variations are generally considered to be a major cause of performance degradation of the VLSI system. However, it can be seen that DAC mismatch as shown in FIG. 12 does not affect probabilistic computing according to an embodiment of the present invention.

이상과 같이, 본 발명에서는 구체적인 구성 요소등과 같은 특정 사항들과 한정된 실시 예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해 제공된 것일 뿐, 본 발명은 상기의 실시 예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시 예에 국한되어 정해져서는 안되며, 후술하는 특허청구범위 뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and limited embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto without departing from the scope of the present invention. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, fall within the scope of the spirit of the present invention .

Claims

A recognition system using an artificial neural network,
A clock generator for generating a system clock;
An analog random noise generator for generating an analog random signal; And
A comparator for receiving the analog random noise and the neuron weights and performing a comparison,
Wherein the comparator generates an active function of any one of a step type, a sigmoid type, an identity type, and a ReLU type, performs comparison through a generated activation function,
The comparator
Performing a single comparison with the common mode voltage of the analog random noise generator when generating the step type active function,
Performing multiple comparisons with Gaussian noise when generating the sigmoid type active function,
The Gaussian noise is replaced with the homogeneous noise when the identity function is generated,
A reference voltage of a digital-analog converter for changing the input random sample to an analog voltage when generating the ReLU type active function is increased from 0 to the common mode voltage to perform a comparison
Recognition system.