KR20190013162A

KR20190013162A - Method for convolution operation redution and system for performing the same

Info

Publication number: KR20190013162A
Application number: KR1020170097323A
Authority: KR
Inventors: 유승주
Original assignee: 서울대학교산학협력단
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2019-02-11
Also published as: KR102034659B1

Abstract

Embodiments in the present invention relate to a hardware accelerator for a convolution neural network and a method for reducing a convolution operation and, more specifically, to a method for reducing a convolution operation by stopping an ongoing convolution operation when it is predicted that an output value of each convolution is to be included in a set range and performing subordinated convolution and a hardware accelerator performing the same.

Description

{METHOD FOR CONVOLUTION OPERATION REDUCTION AND SYSTEM FOR PERFORMING THE SAME}

본 명세서에서 개시되는 실시예들은 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기 및 컨볼루션 연산량 감소 방법에 관한 것으로서, 보다 구체적으로는 출력값이 설정범위에 포함될 것으로 예측되는 컨볼루션 연산을 중단하고, 후순위 컨볼루션을 수행함으로써 컨볼루션 연산량을 감소시키는 방법 및 그를 수행하는 하드웨어 가속기에 관한 것이다. Embodiments disclosed herein relate to a hardware accelerator and convolutional arithmetic reduction method for convolutional neural networks, and more particularly to a method and apparatus for halting a convolution operation whose output value is expected to be included in a set range, And a hardware accelerator for performing the convolution operation.

딥뉴럴네트워크는 일반적으로 컨볼루션을 사용하는 CNN (convolutional neural network)과 행렬과 벡터의 곱을 주된 계산으로하는 RNN (recurrent neural network)으로 나뉜다. CNN은 영상처리에, RNN은 연속된 데이터처리에 적합한 것으로 알려져 있다. CNN에서 대부분의 계산시간은 컨볼루션 동작이 차지한다. CNN의 각 레이어(layer)에서는 입력데이터와 커널간의 컨볼루션으로 출력을 구한 후 활성함수(일반적으로는 ReLU)를 적용하여 해당 레이어의 출력을 구한다. Deep neural networks are generally divided into convolutional neural networks (CNNs) using convolution and recurrent neural networks (RNN) using matrix-vector multiplication. CNN is known to be suitable for image processing, and RNN is suitable for continuous data processing. In CNN, most of the computation time is occupied by the convolution operation. At each layer of CNN, the output is obtained by convoluting between the input data and the kernel, and the output of the corresponding layer is obtained by applying an active function (generally, ReLU).

CNN의 경우 컨볼루션이 연산의 대부분을 차지하므로, 전체적인 연산 시간을 단축하기 위하여 컨볼루션 연산의 효율 개선이 필요하다. In the case of CNN, the convolution takes up most of the computation, so it is necessary to improve the efficiency of convolution computation to shorten the overall computation time.

관련하여 선행기술 문헌인 한국 등록특허 제10-1563569호에서는 일례의 패턴 세트의 사전 훈련을 통해 다이내믹 시각 이미지 패턴을 인식하는 시스템 및 방법을 개시하고 있다. 그러나 선행문헌에 개시된 것과 같은 컨볼루션 방식은 컨볼루션 연산의 효율을 향상시키는 데에서는 미흡하여 개선의 필요가 있다. Korean Patent No. 10-1563569, which is a prior art document, discloses a system and method for recognizing a dynamic visual image pattern through preliminary training of an exemplary pattern set. However, the convolution method as disclosed in the prior art is insufficient in improving the efficiency of the convolution operation and needs to be improved.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다. On the other hand, the background art described above is technical information acquired by the inventor for the derivation of the present invention or obtained in the derivation process of the present invention, and can not necessarily be a known technology disclosed to the general public before the application of the present invention .

본 명세서에서 개시되는 실시예들은, 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기 및 컨볼루션 연산량 감소 방법을 개시하는 데에 목적이 있다. Embodiments disclosed herein are aimed at disclosing a hardware accelerator and convolutional computational complexity reduction method for convolutional neural networks.

또한 실시예들은, 컨볼루션 뉴럴 네트워크를 수행함에 있어서, 컨볼루션 연산의 출력값을 미리 예측하여 무의미한 컨볼루션 연산을 중단함으로써 컨볼루션 연산량을 감소시키고자 한다. Also, in performing the convolutional neural network, the embodiments attempt to reduce the convolution operation amount by predicting the output value of the convolution operation in advance and stopping the meaningless convolution operation.

또한 실시예들은, 컨볼루션 뉴럴 네트워크를 수행함에 있어서, 컨볼루션 연산을 조기에 예측하여 컨볼루션 연산량을 감소시키고자 한다. Embodiments also attempt to reduce the amount of convolutional computation by predicting the convolution operation early in performing the convolutional neural network.

또한 실시예들은, 컨볼루션 뉴럴 네트워크를 수행함에 있어서, 출력값의 정확도와 연산량 감소율의 적정점을 찾아 컨볼루션 연산량을 효과적으로 감소시키고자 한다. Also, in performing the convolutional neural network, the embodiments attempt to find a proper point of the output value and the reduction rate of the computation amount to effectively reduce the convolution operation amount.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, 컨볼루션 뉴럴 네트워크(CNN)를 위한 하드웨어 가속기가 수행하는 것으로서, 입력 데이터에 대한 복수의 컨볼루션을 수행함에 있어서, 각 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측하는 단계; 출력값이 상기 설정범위에 포함될 것으로 예측되는 컨볼루션 연산을 중단하고, 후순위 컨볼루션을 수행하는 단계를 포함하는 컨볼루션 연산량 감소 방법이 개시된다. According to an embodiment of the present invention, a hardware accelerator for a convolutional neural network (CNN) performs a plurality of convolution operations on input data, Estimating whether or not an output value of the output signal is included in the setting range; Stopping a convolution operation that is expected to include an output value in the setting range, and performing a subordinate convolution.

다른 실시예에 따르면, 컨볼루션 뉴럴 네트워크(CNN)를 위한 하드웨어 가속기에 있어서, 입력 데이터에 대한 복수의 컨볼루션을 수행함에 있어서, 각 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측하고, 출력값이 상기 설정범위에 포함될 것으로 예측되는 컨볼루션 연산을 중단하고, 후순위 컨볼루션을 수행하는 것을 특징으로 하는 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기가 개시된다. According to another embodiment, in a hardware accelerator for a convolutional neural network (CNN), in performing a plurality of convolutions on input data, it is predicted whether or not the output value of each convolution is included in the setting range, Wherein the convolution operation is predicted to be included in the set range, and performs a subordinate convolution.

또 다른 실시예에 따르면, 컨볼루션 연산량 감소 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체가 개시된다. 이때 컨볼루션 연산량 감소 방법은, 컨볼루션 뉴럴 네트워크(CNN)를 위한 하드웨어 가속기가 수행하는 것으로서, 입력 데이터에 대한 복수의 컨볼루션을 수행함에 있어서, 각 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측하는 단계; 출력값이 상기 설정범위에 포함될 것으로 예측되는 컨볼루션 연산을 중단하고, 후순위 컨볼루션을 수행하는 단계를 포함할 수 있다. According to yet another embodiment, a computer-readable recording medium on which a program for performing a convolutional operation amount reduction method is recorded is disclosed. At this time, the method of reducing the convolutional operation amount is performed by a hardware accelerator for the convolutional neural network (CNN). In performing a plurality of convolution operations on the input data, whether the output value of each convolution is included in the set range Predicting; Stopping the convolution operation in which the output value is predicted to be included in the setting range, and performing subordinate convolution.

또 다른 실시예에 따르면, 뉴럴 네트워크를 위한 하드웨어 가속기에 의해 수행되며, 컨볼루션 연산량 감소 방법을 수행하기 위해 매체에 저장된 컴퓨터 프로그램이 개시된다. 이때 컨볼루션 연산량 감소 방법은, 컨볼루션 뉴럴 네트워크(CNN)를 위한 하드웨어 가속기가 수행하는 것으로서, 입력 데이터에 대한 복수의 컨볼루션을 수행함에 있어서, 각 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측하는 단계; 출력값이 상기 설정범위에 포함될 것으로 예측되는 컨볼루션 연산을 중단하고, 후순위 컨볼루션을 수행하는 단계를 포함할 수 있다. According to yet another embodiment, a computer program stored on a medium for performing a convolutional computational load reduction method, which is performed by a hardware accelerator for a neural network, is disclosed. At this time, the method of reducing the convolutional operation amount is performed by a hardware accelerator for the convolutional neural network (CNN). In performing a plurality of convolution operations on the input data, whether the output value of each convolution is included in the set range Predicting; Stopping the convolution operation in which the output value is predicted to be included in the setting range, and performing subordinate convolution.

전술한 과제 해결 수단 중 어느 하나에 의하면, 본 명세서에서 개시되는 실시예들은, 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기 및 컨볼루션 연산량 감소 방법을 개시할 수 있다. According to any one of the above-described task solutions, embodiments disclosed herein may disclose a hardware accelerator and a convolutional operation amount reduction method for a convolutional neural network.

또한 실시예들은, 컨볼루션 뉴럴 네트워크를 수행함에 있어서, 컨볼루션 연산의 출력값을 미리 예측하여 무의미한 컨볼루션 연산을 중단함으로써 컨볼루션 연산량을 감소시킬 수 있다. Also, in performing the convolutional neural network, the embodiments can reduce the convolution operation amount by predicting the output value of the convolution operation in advance and interrupting the meaningless convolution operation.

또한 실시예들은, 컨볼루션 뉴럴 네트워크를 수행함에 있어서, 컨볼루션 연산을 조기에 예측하여 컨볼루션 연산량을 감소시킬 수 있다. Embodiments can also reduce the amount of convolutional computation by predicting the convolution operation early in performing the convolutional neural network.

또한 실시예들은, 컨볼루션 뉴럴 네트워크를 수행함에 있어서, 출력값의 정확도와 연산량 감소율의 적정점을 찾아 컨볼루션 연산량을 효과적으로 감소시킬 수 있다. Also, in performing the convolutional neural network, the embodiments can find a proper point of the output value and the reduction rate of the computation amount and effectively reduce the convolution operation amount.

이와 같은 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기 및 컨볼루션 연산량 감소 방법은 딥러닝 기반의 서버 및 모바일 시스템 등의 연산량을 효과적으로 감소시킬 수 있다. The hardware acceleration and convolutional computation reduction methods for such convolutional neural networks can effectively reduce the computational complexity of deep-run-based servers and mobile systems.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 개시되는 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. The effects obtained in the disclosed embodiments are not limited to the effects mentioned above, and other effects not mentioned are obvious to those skilled in the art to which the embodiments disclosed from the following description belong It can be understood.

도 1은 일 실시예에 따른 뉴럴 네트워크를 설명하기 위한 예시도이다.
도 2는 일 실시예에 따른 뉴럴 네트워크 시스템의 구성을 도시한 블록도이다.
도 3 은 일 실시예에 따른 하드웨어 가속기를 설명하기 위한 구성도이다.
도 4 및 도 5는 일 실시예데 따른 하드웨어 가속기에서 수행되는 컨볼루션 연산을 설명하기 위한 예시도이다.
도 6 및 도 7은 일 실시예에 따른 컨볼루션 연산량 감소 방법에 대해 설명하기 위한 순서도이다. 1 is an exemplary view for explaining a neural network according to an embodiment.
2 is a block diagram illustrating a configuration of a neural network system according to an embodiment.
3 is a block diagram illustrating a hardware accelerator according to an embodiment of the present invention.
FIGS. 4 and 5 are exemplary diagrams illustrating a convolution operation performed in a hardware accelerator according to an embodiment.
6 and 7 are flowcharts for explaining a convolution operation amount reduction method according to an embodiment.

아래에서는 첨부한 도면을 참조하여 다양한 실시예들을 상세히 설명한다. 아래에서 설명되는 실시예들은 여러 가지 상이한 형태로 변형되어 실시될 수도 있다. 실시예들의 특징을 보다 명확히 설명하기 위하여, 이하의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서 자세한 설명은 생략하였다. 그리고, 도면에서 실시예들의 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Various embodiments are described in detail below with reference to the accompanying drawings. The embodiments described below may be modified and implemented in various different forms. In order to more clearly describe the features of the embodiments, detailed descriptions of known matters to those skilled in the art are omitted. In the drawings, parts not relating to the description of the embodiments are omitted, and like parts are denoted by similar reference numerals throughout the specification.

명세서 전체에서, 어떤 구성이 다른 구성과 "연결"되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐 아니라, '그 중간에 다른 구성을 사이에 두고 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성이 어떤 구성을 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한, 그 외 다른 구성을 제외하는 것이 아니라 다른 구성들을 더 포함할 수도 있음을 의미한다. Throughout the specification, when a configuration is referred to as being "connected" to another configuration, it includes not only a case of being directly connected, but also a case of being connected with another configuration in between. In addition, when a configuration is referred to as "including ", it means that other configurations may be included, as well as other configurations, as long as there is no specially contradicted description.

이하 첨부된 도면을 참고하여 실시예들을 상세히 설명하기로 한다. Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 뉴럴 네트워크 시스템(Neural Network System)에서 수행되는 뉴럴 네트워크를 설명하기 위한 예시도이다. 1 is an exemplary diagram illustrating a neural network performed in a neural network system according to an exemplary embodiment.

컨볼루션 뉴럴 네트워크는 입력 데이터를 각 커널과 컨볼루션하는 과정을 거쳐 출력 데이터를 도출한다. 이때 컨볼루션은, 입력 데이터를 처리함에 있어서 각 커널에 대응하는 가중치(w; weight)를 적용하는 과정을 포함한다. The convolutional neural network convolves input data with each kernel to derive output data. At this time, the convolution includes a process of applying a weight (w) corresponding to each kernel in processing the input data.

도 1을 참고하면, 컨볼루션은 복수의 입력데이터과 복수의 커널에 대하여 수행되는 곱연산과 이로 인한 복수의 결과들을 합성하는 과정을 포함한다. Referring to FIG. 1, convolution includes a multiplication operation performed on a plurality of input data and a plurality of kernels, and a process of synthesizing a plurality of results.

한편, 도 1에 도시된 것과 같은 컨볼루션은 각 레이어마다 반복될 수 있다. 즉, 컨볼루션을 통해 도출된 출력 데이터가 다음 레이어의 입력 데이터로 입력되어 다시 한 번 컨볼루션이 수행된다. 이와 같은 컨볼루션은 최종 출력값이 도출될 때까지 복수의 레이어에서 반복된다. 보다 구체적인 실시예들은 관련 부분에서 후술한다. On the other hand, the convolution as shown in Fig. 1 can be repeated for each layer. That is, the output data derived through the convolution is input to the input data of the next layer, and the convolution is performed once again. Such convolution is repeated in multiple layers until the final output value is derived. More specific embodiments will be described later in the relevant part.

도 2는 일 실시예에 따른 뉴럴 네트워크 시스템(100)의 구성을 도시한 블록도이다. 2 is a block diagram illustrating a configuration of a neural network system 100 according to an embodiment.

본 명세서에서 개시되는 일 실시예에 따른 뉴럴 네트워크 시스템(100)은, 특히 컨볼루션 뉴럴 네트워크(CNN; Convolution Neural Network)를 구동하기 위한 시스템으로서, 하드웨어 가속기를 포함할 수 있다. 구체적인 구성은 도 2를 참고하여 설명한다. The neural network system 100 according to one embodiment disclosed herein may include a hardware accelerator, in particular, a system for driving a Convolution Neural Network (CNN). A specific configuration will be described with reference to Fig.

도 2를 참고하면, 일 실시예에 따른 뉴럴 네트워크 시스템(100)은 입출력장치(110), 저장장치(120) 및 연산장치(130)를 포함할 수 있다. Referring to FIG. 2, the neural network system 100 according to an embodiment may include an input / output device 110, a storage device 120, and a computing device 130.

일 실시예에 따른 입출력장치(110)는 유저로부터 입력을 수신하기 위한 입력장치와, 작업의 수행 결과 또는 뉴럴 네트워크 시스템(100)의 상태 등의 정보를 표시하기 위한 출력장치를 포함할 수 있다. 예를 들어, 입출력장치(110)는 데이터 처리의 명령을 수신하기 위한 입력장치와 수신한 명령에 따라 처리된 결과를 출력하는 출력장치를 포함할 수 있다. The input / output device 110 according to an embodiment may include an input device for receiving input from a user, and an output device for displaying information such as a result of performing a job or a status of the neural network system 100. [ For example, the input / output device 110 may include an input device for receiving an instruction of data processing and an output device for outputting a result processed according to the received instruction.

한편, 저장장치(120)는 뉴럴 네트워크를 수행하기 위한 데이터를 저장할 수 있다. 가령 컨볼루션 뉴럴 네트워크의 대상이 되는 입력 데이터를 저장할 수 있고, 컨볼루션 뉴럴 네트워크의 수행 결과로서 출력 데이터를 저장할 수 있다. Meanwhile, the storage device 120 may store data for performing a neural network. For example, store the input data that is the object of the convolutional neural network and store the output data as a result of the convolution neural network.

이때 저장장치(120)는 SSD(solid state drive), 플래시 메모리 (flash memory), MRAM(magnetic random access memory), PRAM(phase change RAM), FeRAM(ferroelectric RAM) 하드디스크, 플래시 메모리 등을 포함할 수 있으며, SRAM(synchronous random access memory), DRAM(dynamic random access memory)등을 포함할 수도 있다. The storage device 120 may include a solid state drive (SSD), a flash memory, a magnetic random access memory (MRAM), a phase change RAM (PRAM), a ferroelectric RAM (FeRAM) hard disk, And may include synchronous random access memory (SRAM), dynamic random access memory (DRAM), and the like.

또한, 연산장치(130)는 뉴럴 네트워크 시스템(100)의 전체적인 동작을 제어하며, CPU 등과 같은 프로세서를 포함할 수 있다. 또한, 연산장치(130)는 하드웨어 가속기를 포함할 수 있다. 이때, 하드웨어 가속기는 컨볼루션 뉴럴 네트워크를 위하여 컨볼루션 연산을 수행할 수 있으며, GPU로 구현될 수 있다. In addition, the computing device 130 controls the overall operation of the neural network system 100 and may include a processor, such as a CPU. In addition, the computing device 130 may include a hardware accelerator. At this time, the hardware accelerator may perform the convolution operation for the convolutional neural network and may be implemented by the GPU.

다음으로 도 3을 참조하면, 뉴럴 네트워크 시스템(100)의 일부 구성의 실시예를 도시한 구성도이다. Referring now to FIG. 3, there is shown a block diagram of an embodiment of a portion of a neural network system 100. FIG.

도 3에 따르면, 뉴럴 네트워크 시스템(100)은 저장장치(120)로서 off-chip DRAM(121)을 포함할 수 있으며, 연산장치(130)로서 하드웨어 가속기(131)를 포함할 수 있다. off-chip DRAM(121)과 하드웨어 가속기(131)의 사이는 DMA 유닛(DMA unit, 141)과 광역 버퍼(Global Buffer, 142)로 연결될 수 있다. off-chip DRAM(121)과 하드웨어 가속기(131)는 DMA 유닛(141)을 통해 DMA 방식으로 데이터를 송수신하며, 송수신된 데이터를 광역 버퍼(142)에 임시적으로 보관할 수 있다. 3, the neural network system 100 may include an off-chip DRAM 121 as a storage device 120 and may include a hardware accelerator 131 as a computing device 130. The off-chip DRAM 121 and the hardware accelerator 131 may be connected to each other by a DMA unit (141) and a global buffer (142). The off-chip DRAM 121 and the hardware accelerator 131 transmit and receive data in the DMA scheme through the DMA unit 141 and temporarily store the transmitted and received data in the wide area buffer 142.

또한, 도 3의 실시예에 따르면, 하드웨어 가속기(131)는 복수의 계산단위(PE; processing element, 132)의 배열(PE Arrays)로 구성될 수 있다. 이때, 하드웨어 가속기(131)는 각 계산단위(132)를 통해 상술한 광역 버퍼(142)에 저장된 데이터를 읽고, 컨볼루션을 수행할 수 있다. 3, the hardware accelerator 131 may comprise a plurality of arrays of processing elements (PEs) 132 (PE arrays). At this time, the hardware accelerator 131 can read the data stored in the wide area buffer 142 and perform convolution through each calculation unit 132.

관련하여 도 4를 참조하면, 각 계산단위에서 컨볼루션을 수행하는 실시예에 대한 예시도가 도시되어 있다. 도 3의 실시예에 따르면, 각 계산단위는 RF(Register File)에 컨볼루션의 대상이 되는 입력 데이터를 저장할 수 있다. 또한, Fetch Controller는 RF에서 입력 데이터를 선택하여 커널의 가중치(Kernel weight) 및 활성함수(act.; activation function)를 적용하여 컨볼루션을 수행한 후, 다시 RF에 저장할 수 있다. 이때 Fetch Controller는 이와 같은 컨볼루션 과정을 반복하여 수행할 수 있다. Referring now to FIG. 4 in conjunction with FIG. 4, there is shown an exemplary diagram of an embodiment for performing convolution in each calculation unit. According to the embodiment of FIG. 3, each calculation unit can store input data to be convoluted in an RF (Register File). In addition, the Fetch Controller can select the input data from the RF, apply the kernel weight and the activation function to the convolution, and then store it again in the RF. At this time, the fetch controller can repeat the convolution process.

한편, 실시예에 따른 하드웨어 가속기(131)는, 상술한 바와 같은 컨볼루션을 수행함에 있어서, 컨볼루션의 연산량을 감소시키기 위하여 컨볼루션 연산량 감소 방법을 수행할 수 있다. 하드웨어 가속기(131)는 복수의 레이어에 대하여 컨볼루션을 수행할 수 있고, 실시예에 따른 연산량 감소 방법은 각각의 레이어에 적용될 수 있다. 이하에서는 하나의 레이어를 기준으로 하여 설명하되, 필요에 따라 복수의 레이어를 가정하여 설명한다. Meanwhile, in performing the convolution described above, the hardware accelerator 131 according to the embodiment may perform a convolution operation amount reduction method to reduce the amount of convolution operation. The hardware accelerator 131 may perform convolution with respect to a plurality of layers, and the calculation amount reduction method according to the embodiment may be applied to each layer. Hereinafter, a description will be made on the basis of one layer, assuming a plurality of layers as necessary.

실시예에 따르면, 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기(131)는 입력 데이터와 커널에 대한 복수의 컨볼루션을 수행함에 있어서, 각 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측할 수 있다. 이때, 하드웨어 가속기(131)는 출력값이 설정범위에 포함될 것으로 예측되는 컨볼루션의 연산을 중단하고, 나머지 컨볼루션을 수행할 수 있다. 이때 설정범위는 음수로 설정될 수 있다. According to the embodiment, the hardware accelerator 131 for the convolutional neural network can predict whether or not the output value of each convolution is included in the setting range when performing a plurality of convolutions with respect to the input data and the kernel. At this time, the hardware accelerator 131 can stop the calculation of the convolution which is expected to include the output value in the set range, and can perform the remaining convolution. At this time, the setting range can be set to a negative value.

관련하여, 컨볼루션 뉴럴 네트워크에서는 컨볼루션 수행 후 활성함수(activation function)로 ReLU (rectified linear unit)를 적용할 수 있다. 이때 컨볼루션 출력값이 음수이면, ReLU 연산의 결과는 0으로 출력된다. In relation to the convolutional neural network, ReLU (rectified linear unit) can be applied as an activation function after convolution. At this time, if the convolution output value is negative, the result of the ReLU operation is output as 0.

따라서 하드웨어 가속기(131)는 컨볼루션 출력값이 음수로 예측되면, 컨볼루션을 중단하고, ReLU 연산의 출력값을 0으로 출력하고, 후순위 컨볼루션을 수행함으로써 컨볼루션 연산량을 감소시킬 수 있다. Therefore, if the convolution output value is predicted as a negative value, the hardware accelerator 131 can reduce the convolution operation amount by stopping the convolution, outputting the output value of the ReLU operation as 0, and performing the subordinate convolution.

한편, 하드웨어 가속기(131)는 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측함에 있어서, 출력값의 중간결과를 도출하고, 도출한 중간결과에 기초하여 출력값이 설정범위에 포함될지 여부를 예측할 수 있다. 예를 들어, 중간결과를 임계값과 비교하여 출력값이 설정범위에 포함될지 여부를 예측할 수 있다. Meanwhile, in predicting whether or not the output value of the convolution is included in the setting range, the hardware accelerator 131 may derive the intermediate result of the output value and predict whether or not the output value is included in the setting range based on the derived intermediate result have. For example, the intermediate result may be compared with a threshold value to predict whether or not the output value is included in the setting range.

관련하여, 입력 데이터는 복수의 입력 특성 지도(input feature map)를 포함할 수 있다. 이때, 하드웨어 가속기(131)는 복수의 입력 특성 지도와 커널의 곱연산을 순차적으로 수행하고, 각 곱연산의 결과를 합성함으로써 컨볼루션의 출력값을 도출할 수 있다. In this regard, the input data may include a plurality of input feature maps. At this time, the hardware accelerator 131 sequentially performs a multiplication operation of a plurality of input property maps and a kernel, and combines the results of the multiplication operations to derive an output value of the convolution.

즉, 커널에 대하여 복수의 입력 특성 지도와의 곱연산을 수행함으로써 곱연산이 수회 이루어질 수 있는데, 이 경우 각 곱연산의 종료 시까지 수행된 곱연산의 결과를 합성하여 중간결과를 도출할 수 있다. That is, the multiplication operation may be performed a plurality of times by performing a multiplication operation with a plurality of input property maps with respect to the kernel. In this case, the intermediate result can be derived by synthesizing the result of the multiplication operation performed until the end of each multiplication operation .

가령, 1번째 곱연산의 종료 시에는 1번째 곱연산의 결과가 중간결과로, 2번째 곱연산의 종료 시에는 1번째 곱연산의 결과와 2번째 곱연산의 결과의 합성이 중간결과로 도출될 수 있고, N번째 곱연산의 종료 시에 1번째 곱연산부터 N번째 곱연산까지의 결과를 합성한 값을 해당 입력 특성 지도에 대한 컨볼루션의 중간결과로 도출할 수 있다. For example, at the end of the first product operation, the result of the first product operation is the intermediate result, and at the end of the second product operation, the result of the first product operation and the result of the second product operation are derived as intermediate results At the end of the Nth product operation, a result obtained by combining the results from the first product operation to the Nth product operation can be derived as an intermediate result of the convolution on the input property map.

이때, 하드웨어 가속기(131)는 상기와 같이 도출된 중간결과에 기초하여 컨볼루션의 출력값이 설정범위에 포함되는지 여부, 예를 들어 음수인지 여부를 예측하고, 만약 컨볼루션의 출력값이 음수인 것으로 예측되면, 해당 컨볼루션을 중단할 수 있다. At this time, the hardware accelerator 131 predicts whether the output value of the convolution is included in the setting range, for example, whether it is negative or not based on the intermediate result derived as described above, and if the output value of the convolution is negative , You can stop the convolution.

또한, 입력 데이터가 복수의 입력 특성 지도를 포함할 때, 하드웨어 가속기(131)는 복수의 입력 특성 지도와 커널의 곱연산을 순차적으로 수행하되, 복수의 입력 특성 지도의 연산 순서를 정렬할 수 있다. 이때, 하드웨어 가속기(131)는 복수의 입력 특성 지도에 대응되는 커널의 가중치(weight)에 기초하여 입력 특성 지도의 연산 순서를 정렬할 수 있다. Further, when the input data includes a plurality of input characteristic maps, the hardware accelerator 131 sequentially performs a multiplication operation of a plurality of input characteristic maps and a kernel, and arranges the operation order of a plurality of input characteristic maps . At this time, the hardware accelerator 131 may sort the operation order of the input characteristic map based on the weights of the kernels corresponding to the plurality of input characteristic maps.

이와 관련하여, 실시예에 따르면, 하드웨어 가속기(131)는 복수의 입력 특성 지도를 컨볼루션 연산으로부터 얻을 수 있다. 다시 말해, 하드웨어 가속기(131)가 컨볼루션 뉴럴 네트워크를 수행함에 있어서 복수의 레이어를 가질 때, 하나의 레이어에 입력되는 입력 특성 지도는 이전의 레이어에서 수행된 컨볼루션 연산의 출력 특성 지도(output feature map)일 수 있다. In this regard, according to the embodiment, the hardware accelerator 131 may obtain a plurality of input property maps from the convolution operation. In other words, when the hardware accelerator 131 has a plurality of layers in performing the convolutional neural network, the input characteristic map input to one layer is an output characteristic map of the convolution operation performed in the previous layer map.

또한, 각 레이어에서 컨볼루션을 수행함에 있어서, 각각의 레이어에서 입력 데이터와 연산할 커널이 하나 이상 존재할 수 있다. 즉, 하드웨어 가속기(131)는 복수의 입력 특성 지도와 복수의 커널에 대하여 다대다의 컨볼루션 연산을 수행하고, 이를 합성하여 출력 특성 지도를 도출할 수 있다. 따라서 각 출력 특성 지도는 대응하는 커널에 따라 가중치를 가질 수 있다. 즉, 소정 레이어에 대한 입력 특성 지도 각각은, 이전 단계의 레이어에서 수행된 컨볼루션 연산에 따라 대응하는 가중치를 가질 수 있다. Also, in performing convolution on each layer, there may be one or more kernels to be operated on with input data in each layer. That is, the hardware accelerator 131 may perform many-to-many convolution operations on a plurality of input characteristic maps and a plurality of kernels, and combine them to derive an output characteristic map. Therefore, each output characteristic map can have a weight according to the corresponding kernel. That is, each input characteristic map for a predetermined layer may have a corresponding weight according to the convolution operation performed in the layer of the previous layer.

이를 이용하여 하드웨어 가속기(131)는 입력 특성 지도에 대응되는 선행 레이어의 커널의 가중치에 기초하여 입력 특성 지도의 연산 순서를 정렬할 수 있다. Using this, the hardware accelerator 131 can sort the operation order of the input characteristic map based on the weight of the kernel of the preceding layer corresponding to the input characteristic map.

이때, 하드웨어 가속기(131)는 대응되는 커널의 가중치에 따라 입력 특성 지도의 연산 순서를 정렬하되, 가중치의 절대값이 큰 순서대로 입력 특성 지도의 연산 순서를 정렬할 수 있다. 가중치의 절대값이 클 수록 컨볼루션의 출력값에 미치는 영향력이 커지므로, 이와 같이 연산 순서를 정렬하면, 출력값을 조기에 예측하여 무의미한 연산을 조기에 중단할 수 있다. At this time, the hardware accelerator 131 arranges the operation order of the input property map according to the weight of the corresponding kernel, and arranges the operation order of the input property map in the order of the largest absolute value of the weights. The larger the absolute value of the weight, the larger the influence on the output value of the convolution becomes. Thus, by sorting the operation sequence in this way, the output value can be predicted early and the meaningless operation can be stopped early.

한편, 실시예에 따르면 하드웨어 가속기(131)는 임계값을 설정할 수 있다. 연산의 감소량과 출력값의 정확도는 음의 상관관계를 갖는다. 즉, 연산량을 많이 줄이면 정확도가 떨어지고, 정확도를 높이면 연산량이 늘어나게 된다. 따라서 하드웨어 가속기(131)는 정확도의 허용치 내에서 최대한 연산량을 줄일 수 있는 임계값을 설정할 수 있다. Meanwhile, according to the embodiment, the hardware accelerator 131 can set a threshold value. The amount of reduction of the operation and the accuracy of the output value have a negative correlation. That is, if the amount of computation is reduced a lot, the accuracy decreases. If the accuracy is increased, the computation amount increases. Accordingly, the hardware accelerator 131 can set a threshold value that can reduce the amount of calculation to the maximum within the tolerance of the accuracy.

예를 들어, 하드웨어 가속기(131)는 임계값의 역할을 하는 샘플값을 다양하게 대입하여 컨볼루션을 수행함으로써 도출된 출력값의 정확도 및 연산의 감소량을 분석하고, 정확도와 연산의 감소량의 상관관계에 기초하여 임계값을 설정할 수 있다. For example, the hardware accelerator 131 analyzes the accuracy of the output value and the reduction amount of the calculation derived by performing convolution by variously substituting the sample value serving as the threshold value, and calculates the correlation between the accuracy and the reduction amount of the calculation The threshold value can be set based on the threshold value.

가령, 학습된 N개의 레이어를 가진 뉴럴 네트워크를 하드웨어 가속기(131)가 수행할 때, N-1개의 레이어들은 컨볼루션을 중단하지 않고 모두 수행하도록 설정하고, 하나의 레이어에 대하여 임계값에 대응하는 샘플값을 작은 값에서 큰 값까지 다양하게 변경하며 컨볼루션을 수행한다. 이때, 출력값을 분석하여 출력값의 정확도(quality) 및 임계값의 상관관계(A)를 연산할 수 있다. 이때, 정확도에 대하여 허용치를 설정하면, 허용된 정확도 중 가장 낮은 정확도일 때의 임계값을 얻을 수 있다. 또한 같은 방법으로 연산의 감소량과 임계값의 상관관계(B)를 연산할 수 있다. 이와 같은 과정을 각 레이어에서 수행할 수 있다. For example, when the hardware accelerator 131 performs the neural network having the learned N layers, the N-1 layers are set to perform all without stopping the convolution, and for each layer, Converts the sample value from small to large values. At this time, it is possible to calculate the correlation (A) between the output quality value and the threshold value by analyzing the output value. At this time, if the tolerance is set for the accuracy, the threshold value at the lowest accuracy among the allowed accuracy can be obtained. In the same way, it is possible to calculate the correlation (B) between the reduction amount of the operation and the threshold value. This process can be performed in each layer.

이때, A 및 B의 상관관계에 기초하여 정확도와 연산의 감소량의 상관관계(C)를 얻을 수 있다. At this time, the correlation (C) between the accuracy and the reduction amount of the calculation can be obtained based on the correlation between A and B. [

또한, 모든 레이어에서 컨볼루션을 수행하여 얻은 뉴럴 네트워크의 최종 출력값에 대하여 손실의 허용치를 설정할 수 있다. 이때, 허용된 손실을 각 레이어에 분배하여 각 레이어에서 가장 효과적인 임계값을 구할 수 있다. In addition, tolerance of loss can be set for the final output value of the neural network obtained by performing convolution in all layers. At this time, the allowable loss can be distributed to each layer to obtain the most effective threshold value at each layer.

구체적으로는, 정확도 대비 감소량의 효율이 가장 좋은 레이어에 대하여 허용된 손실의 일부 값을 분배한다. 이때, 정확도와 감소량의 상관관계(C)에 기초하여, 분배된 레이어의 감소량을 연산하고, 감소량에 기초하여 임시 임계값을 연산한다. Specifically, some values of the allowed loss are distributed for the layer with the best efficiency of the reduction amount compared with the accuracy. At this time, the reduction amount of the divided layer is calculated based on the correlation (C) between the accuracy and the reduction amount, and the temporary threshold value is calculated based on the reduction amount.

그 상태에서 가장 효율이 좋은 레이어에 대하여 허용된 손실의 소정 값을 분배하고, 같은 방법으로 임시 임계값을 연산한다. 이 같은 과정을 반복하여 허용된 손실이 모두 분배되면, 그때의 각 레이어에 설정된 임시 임계값을, 적용할 임계값으로 설정할 수 있다. In this state, the predetermined value of the allowed loss is distributed to the layer with the highest efficiency, and the temporary threshold value is calculated in the same manner. If all the allowed losses are distributed by repeating this process, the temporary threshold value set for each layer at that time can be set as a threshold value to be applied.

한편, 도 5를 참고하면, 연산의 감소량(가로축)과 정확도의 손실(세로축)에 대한 예시도가 도시되어 있다. 도 5와 같이 연산의 감소량이 클수록 정확도의 손실은 늘어나게 된다. 상술한 실시예에 따르면 허용된 손실에 대하여 가장 효과적인 임계값을 도출하여 적용할 수 있다. On the other hand, referring to Fig. 5, there is shown an example of the amount of reduction (horizontal axis) of calculation and the loss of accuracy (vertical axis). As shown in FIG. 5, the greater the amount of reduction of the calculation, the greater the loss of accuracy. According to the above-described embodiment, the most effective threshold value for the allowed loss can be derived and applied.

한편, 도 6 및 도 7은 하드웨어 가속기(131)가 수행하는 컨볼루션 연산량 감소 방법을 설명하기 위한 순서도들이다. 도 6 및 도 7에 도시된 실시예에 따른 컨볼루션 연산량 감소 방법은 도 1 내지 도 5와 관련한 실시예에 따른 하드웨어 가속기(131)에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하에서 생략된 내용이라고 하더라도 도 1 내지 도 5와 관련한 실시예에 따른 하드웨어 가속기(131)에 관하여 이상에서 기술한 내용은 도 6 및 도 7에 도시된 실시예들에 따른 컨볼루션 연산량 감소 방법에도 적용될 수 있다. 6 and 7 are flowcharts for explaining a method of decreasing the convolutional operation amount performed by the hardware accelerator 131. FIG. The method of reducing the convolutional operation amount according to the embodiment shown in FIGS. 6 and 7 includes steps that are processed in a time-series manner in the hardware accelerator 131 according to the embodiment of FIGS. 1-5. Therefore, even if omitted from the following description, the above description of the hardware accelerator 131 according to the embodiment related to Figs. 1 to 5 can be applied to the case where the convolution operation amount reduction according to the embodiments shown in Figs. 6 and 7 Method.

도 6을 참고하면, 컨볼루션 뉴럴 네트워크를 위한 하드웨어 가속기(131)는 입력 데이터에 대한 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측하고(S61), 출력값이 설정범위에 포함될 것으로 예측되면 컨볼루션 연산을 중단할 수 있다(S62). 6, the hardware accelerator 131 for the convolutional neural network predicts whether or not the output value of the convolution with respect to the input data is included in the setting range (S61). If the output value is predicted to be included in the setting range, The routine operation can be stopped (S62).

다시 말해, 하드웨어 가속기(131)는 입력 데이터에 대한 복수의 컨볼루션을 수행함에 있어서, 각 컨볼루션의 출력값이 설정범위에 포함될지 여부를 예측하고, 출력값이 설정범위에 포함될 것으로 예측되는 컨볼루션 연산을 중단할 수 있다. 또한, 중단한 컨볼루션의 후순위 컨볼루션을 수행할 수 있다. 실시예에 따르면, 설정범위는 음수일 수 있다. In other words, in performing a plurality of convolutions on the input data, the hardware accelerator 131 predicts whether or not the output value of each convolution is included in the setting range, and outputs a convolution operation . &Lt; / RTI > Also, subordinate convolution of the interrupted convolution can be performed. According to the embodiment, the setting range may be negative.

도 7에는 상술한 S61단계를 구체화한 순서도를 도시하였다. 도 7을 참고하면, 하드웨어 가속기(131)는 출력값을 예측하되, 출력값의 중간값을 도출하고(S71), 도출한 중간결과에 기초하여 출력값이 설정범위에 포함될지 여부를 예측할 수 있다(S72). FIG. 7 shows a flow chart embodying the step S61 described above. Referring to FIG. 7, the hardware accelerator 131 predicts an output value, derives an intermediate value of the output value (S71), and predicts whether or not the output value is included in the setting range based on the intermediate result obtained (S72) .

이때, 하드웨어 가속기(131)는, 중간결과에 기초하되, 중간결과와 임계값을 비교하여 출력값이 설정범위에 포함될지 여부를 예측할 수 있다. At this time, the hardware accelerator 131 may compare the intermediate result with the threshold based on the intermediate result, and predict whether or not the output value is included in the set range.

실시예에 따라 입력 데이터가 복수의 입력 특성 지도를 포함할 때, 하드웨어 가속기(131)는 출력값의 중간결과를 도출함에 있어서, 복수의 입력 특성 지도와 커널의 곱연산을 순차적으로 수행하되, 각 곱연산의 종료시까지 수행된 곱연산의 결과를 합성하여 중간결과를 도출할 수 있다. 하드웨어 가속기(131)는 이와 같이 도출된 중간결과에 기초하여 출력값이 설정범위에 포함될지 여부를 예측할 수 있다. When the input data includes a plurality of input characteristic maps, the hardware accelerator 131 sequentially performs a multiplication operation of a plurality of input characteristic maps and a kernel in deriving the intermediate result of the output values, The intermediate result can be derived by synthesizing the result of the multiplication operation performed until the end of the operation. The hardware accelerator 131 can predict whether or not the output value is included in the setting range based on the intermediate result thus derived.

또한, 입력 데이터가 복수의 입력 특성 지도를 포함할 때, 하드웨어 가속기(131)는 출력값을 예측함에 있어서, 복수의 입력 특성 지도와 커널의 곱연산을 순차적으로 수행하되, 복수의 입력 특성 지도의 연산 순서를 정렬하여 곱연산을 순차적으로 수행할 수 있다. When the input data includes a plurality of input characteristic maps, the hardware accelerator 131 sequentially performs a multiplication operation of a plurality of input characteristic maps and a kernel in predicting an output value, The order can be sorted and the product operation can be performed sequentially.

이때 하드웨어 가속기(131)는 각각의 입력 특성 지도에 대응되는 커널의 가중치(weight)에 기초하여 입력 특성 지도의 연산 순서를 정렬할 수 있다. 그리고 하드웨어 가속기(131)는 가중치 또는 가중치의 절대값이 큰 순서대로 연산 순서를 정렬할 수 있다. 가중치 또는 가중치의 절대값이 클수록 출력값에 영향을 크게 미치므로, 가중치 또는 가중치의 절대값이 큰 순서대로 컨볼루션을 수행함으로써 출력값이 설정범위에 포함되는지 여부를 조기에 판단할 수 있다. At this time, the hardware accelerator 131 may sort the operation order of the input characteristic map based on the weights of the kernels corresponding to the respective input characteristic maps. Then, the hardware accelerator 131 can arrange the order of operations in the descending order of the weights or the absolute values of the weights. As the absolute value of the weight or weight has a larger influence on the output value, it is possible to quickly determine whether or not the output value is included in the set range by performing the convolution in the ascending order of the weight or the absolute value of the weight.

이상의 실시예들은 다양한 뉴럴 네트워크 모델에 적용될 수 있다. 이때, 커널의 웨이트, 레이어의 수에 따라 그에 적합한 레이어별 임계값을 도출하여 연산량 감소 방법을 수행할 수 있다. The above embodiments can be applied to various neural network models. At this time, it is possible to derive a threshold value for each layer according to the weight of the kernel and the number of layers, thereby performing a calculation amount reduction method.

이상의 실시예들에서 사용되는 '~장치'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~장치'는 어떤 역할들을 수행한다. 그렇지만 '~장치'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~장치'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~장치'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term " device " used in the above embodiments means a hardware component such as software or a field programmable gate array (FPGA) or an ASIC, and the 'device' performs certain roles. However, "device" is not limited to software or hardware. The 'device' may be configured to be in an addressable storage medium and configured to play one or more processors. Thus, by way of example, 'to device' may include components such as software components, object-oriented software components, class components and task components, and processes, functions, attributes, , Subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

구성요소들과 '~장치'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~장치'들로 결합되거나 추가적인 구성요소들과 '~장치'들로부터 분리될 수 있다.The functions provided in the components and 'devices' may be combined into a smaller number of components and 'devices' or may be separated from additional components and 'devices'.

뿐만 아니라, 구성요소들 및 '~장치'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, the components and " devices " may be implemented to play back one or more CPUs in a device or a secure multimedia card.

도 6 및 도 7을 통해 설명된 실시예에 따른 컨볼루션 연산량 감소 방법은 컴퓨터에 의해 실행 가능한 명령어 및 데이터를 저장하는, 컴퓨터로 판독 가능한 매체의 형태로도 구현될 수 있다. 이때, 명령어 및 데이터는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 소정의 프로그램 모듈을 생성하여 소정의 동작을 수행할 수 있다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터 기록 매체일 수 있는데, 컴퓨터 기록 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다.예를 들어, 컴퓨터 기록 매체는 HDD 및 SSD 등과 같은 마그네틱 저장 매체, CD, DVD 및 블루레이 디스크 등과 같은 광학적 기록 매체, 또는 네트워크를 통해 접근 가능한 서버에 포함되는 메모리일 수 있다. The method of reducing the convolutional operation amount according to the embodiment described with reference to FIGS. 6 and 7 may also be implemented in the form of a computer-readable medium storing instructions and data executable by a computer. At this time, the command and data may be stored in the form of program code, and when executed by the processor, a predetermined program module may be generated to perform a predetermined operation. In addition, the computer-readable medium can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. The computer-readable medium can also be a computer storage medium, which can be volatile and non-volatile, implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, For example, computer recording media may include magnetic storage media such as HDDs and SSDs, optical recording media such as CD, DVD and Blu-ray discs, or other types of media accessible via a network. May be the memory included in the server.

또한 도 6 및 도 7을 통해 설명된 실시예에 따른 컨볼루션 연산량 감소 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다. Also, the method for reducing the convolutional operation amount according to the embodiment described with reference to FIGS. 6 and 7 may be implemented by a computer program (or a computer program product) including instructions executable by the computer. A computer program includes programmable machine instructions that are processed by a processor and can be implemented in a high-level programming language, an object-oriented programming language, an assembly language, or a machine language . The computer program may also be recorded on a computer readable recording medium of a type (e.g., memory, hard disk, magnetic / optical medium or solid-state drive).

따라서 도 6 및 도 7을 통해 설명된 실시예에 따른 컨볼루션 연산량 감소 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다. Therefore, the method of reducing the convolutional operation amount according to the embodiment described with reference to FIGS. 6 and 7 can be realized by the computer program as described above being executed by the computing device. The computing device may include a processor, a memory, a storage device, a high-speed interface connected to the memory and a high-speed expansion port, and a low-speed interface connected to the low-speed bus and the storage device. Each of these components is connected to each other using a variety of buses and can be mounted on a common motherboard or mounted in any other suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다. Where the processor may process instructions within the computing device, such as to display graphical information to provide a graphical user interface (GUI) on an external input, output device, such as a display connected to a high speed interface And commands stored in memory or storage devices. As another example, multiple processors and / or multiple busses may be used with multiple memory and memory types as appropriate. The processor may also be implemented as a chipset comprised of chips comprising multiple independent analog and / or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다. The memory also stores information within the computing device. In one example, the memory may comprise volatile memory units or a collection thereof. In another example, the memory may be comprised of non-volatile memory units or a collection thereof. The memory may also be another type of computer readable medium such as, for example, a magnetic or optical disk.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다. And the storage device can provide a large amount of storage space to the computing device. The storage device may be a computer readable medium or a configuration including such a medium and may include, for example, devices in a SAN (Storage Area Network) or other configurations, and may be a floppy disk device, a hard disk device, Or a tape device, flash memory, or other similar semiconductor memory device or device array.

상술된 실시예들은 예시를 위한 것이며, 상술된 실시예들이 속하는 기술분야의 통상의 지식을 가진 자는 상술된 실시예들이 갖는 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 상술된 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be apparent to those skilled in the art that the above-described embodiments are for illustrative purposes only and that those skilled in the art will readily understand that various changes and modifications can be made without departing from the spirit and scope of the present invention. You will understand. It is therefore to be understood that the above-described embodiments are to be considered in all respects only as illustrative and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 명세서를 통해 보호 받고자 하는 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태를 포함하는 것으로 해석되어야 한다.It is to be understood that the scope of the present invention is defined by the appended claims rather than the foregoing description and should be construed as including all changes and modifications that come within the meaning and range of equivalency of the claims, .

100: 뉴럴 네트워크 시스템 110: 입출력장치
120: 저장장치 130: 연산장치
131: 하드웨어 가속기 100: Neural network system 110: Input / output device
120: storage device 130: computing device
131: Hardware Accelerator

Claims

A hardware accelerator for convolutional neural networks (CNN)
The method comprising: predicting whether or not an output value of each convolution is included in a setting range in performing a plurality of convolution operations on input data; And
Stopping a convolution operation that is predicted to include an output value in the setting range, and performing a subordinate convolution.

The method according to claim 1,
Wherein the set range is a negative number.

The method according to claim 1,
Wherein the predicting comprises:
Deriving an intermediate result of the output value; And
And predicting whether the output value is included in the setting range based on the intermediate result.

The method of claim 3,
Wherein the step of predicting whether the output value is included in the setting range based on the intermediate result includes:
And comparing the intermediate result with a threshold value to predict whether or not the output value is included in the set range.

The method of claim 3,
Wherein the input data includes:
Comprising a plurality of input feature maps,
Wherein deriving an intermediate result of the output value comprises:
Wherein the intermediate result is derived by sequentially performing a multiplication operation of the plurality of input property maps and a kernel and synthesizing the result of the multiplication operation performed until the end of each multiplication operation.

The method according to claim 1,
Wherein the input data includes:
Comprising a plurality of input feature maps,
Wherein the predicting comprises:
Wherein the multiplication operation is sequentially performed by sequentially performing a multiplication operation of the plurality of input property maps and a kernel, and arranging the operation order of the plurality of input property maps.

The method according to claim 6,
Wherein the predicting comprises:
Wherein the arithmetic procedure of the input characteristic map is arranged based on a weight of a kernel corresponding to each input characteristic map, .

A computer-readable recording medium on which a program for carrying out the method according to claim 1 is recorded.

A computer program stored on a medium for performing the method of claim 1, the computer program being performed by a hardware accelerator for a convolutional neural network (CNN).

A hardware accelerator for a convolutional neural network (CNN)
In performing a plurality of convolutions on input data, it is possible to predict whether or not the output value of each convolution is included in the setting range, to stop the convolution operation in which the output value is predicted to be included in the setting range, and to perform subordinate convolution Wherein the hardware accelerator for convolutional neural networks is implemented as a hardware accelerator for convolutional neural networks.

11. The method of claim 10,
Characterized in that the setting range is a negative number.

11. The method of claim 10,
And estimating whether the output value is included in the setting range,
Derives an intermediate result of the output value, and predicts whether the output value is included in the setting range based on the intermediate result.

13. The method of claim 12,
Predicting whether the output value is included in the setting range based on the intermediate result,
Wherein the intermediate result is compared with a threshold value to predict whether the output value is included in the setting range.

13. The method of claim 12,
Wherein the input data includes:
Comprising a plurality of input feature maps,
Wherein the intermediate result is obtained by sequentially performing a multiplication operation of the plurality of input property maps and a kernel to synthesize a result of a multiplication operation performed until the end of each multiplication operation, .

11. The method of claim 10,
Wherein the input data includes:
Comprising a plurality of input feature maps,
Characterized in that the multiplication operation is sequentially performed by sequentially performing a multiplication operation of the plurality of input property maps and a kernel and arranging the operation order of the plurality of input property maps, .

16. The method of claim 15,
Wherein the operation order of the input characteristic map is sorted based on a weight of a kernel corresponding to each input characteristic map. Hardware accelerator for.