CN111095304A - Electronic device and control method thereof - Google Patents
Electronic device and control method thereof Download PDFInfo
- Publication number
- CN111095304A CN111095304A CN201880057625.8A CN201880057625A CN111095304A CN 111095304 A CN111095304 A CN 111095304A CN 201880057625 A CN201880057625 A CN 201880057625A CN 111095304 A CN111095304 A CN 111095304A
- Authority
- CN
- China
- Prior art keywords
- zero
- processing elements
- input
- elements
- zero element
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000012545 processing Methods 0.000 claims abstract description 370
- 238000013135 deep learning Methods 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims abstract description 10
- 238000013473 artificial intelligence Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 3
- 230000010354 integration Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Processing (AREA)
- Multi Processors (AREA)
Abstract
An electronic device and a method thereof are provided for performing deep learning. The electronic device includes: a memory configured to store target data and core data; and a processor including a plurality of processing elements arranged in a matrix shape. The processor is configured to: a first non-zero element of a plurality of first elements included in the target data is input to each of the plurality of processing elements, and a second non-zero element of a plurality of elements included in the core data is sequentially input to each of the plurality of first processing elements included in a first row of the plurality of processing elements. Each of the plurality of first processing elements is configured to perform an operation between the first non-zero element of the input and the second non-zero element of the input based on the depth information of the first non-zero element and the depth information of the second non-zero element.
Description
Technical Field
The present disclosure relates generally to an electronic apparatus and a control method thereof, and more particularly, to an electronic apparatus for performing convolution operation and a control method thereof.
Background
The touch sensing device such as a touch panel can provide an input method using its own body without a separate input device such as a mouse or a keyboard. The touch sensing device is generally applied to a portable electronic device, such as a notebook, in which it is difficult to use a separate input device.
In recent years, artificial intelligence systems that implement human-level intelligence have been used in various fields. Unlike existing rule-based intelligence systems, in artificial intelligence systems, machines learn, make decisions, and become more intelligent. Artificial intelligence systems are becoming more and more popular and existing rule-based intelligence systems are being replaced by these types of artificial intelligence systems based on deep learning.
Artificial intelligence techniques include machine learning (e.g., deep learning) and basic techniques that utilize machine learning.
Machine learning includes algorithmic techniques that may themselves classify/learn features of input data. The basic technique uses machine learning algorithms such as deep learning to simulate functions such as recognition and judgment of the human brain. The basic technologies include technical fields such as language understanding, visual understanding, inference/prediction, knowledge representation, and motion control.
Artificial intelligence techniques can be applied to language understanding, visual understanding, reasoning/prediction, knowledge representation, and motion control.
Language understanding is a technique for recognizing, applying/processing human language/characters and includes natural language processing, machine translation, dialog systems, query response, speech recognition/synthesis, and the like. Visual understanding is a technique of recognizing and processing objects according to human vision, including object recognition, object tracking, image search, human recognition, scene understanding, spatial understanding, image enhancement, and the like. Inference/prediction is a technique for determining information, logical inference, and predictive information, including knowledge/probabilistic-based inference, optimization prediction, preference-based planning, and recommendation.
Knowledge representation is a technique for automating human experience information into knowledge data, including knowledge construction (data generation/classification) and knowledge management (data utilization). Motion control is a technique for controlling the automatic driving of a vehicle and the motion of a robot, and includes motion control (navigation, collision, driving), steering control (behavior control), and the like.
In particular, Convolutional Neural Networks (CNNs) have a structure for learning two-dimensional data or three-dimensional data, and can be trained through a back propagation algorithm. CNNs are widely used in various application fields such as object classification, object detection, and the like.
Most of the operations of CNN are convolution operations, and most of the convolution operations include multiplication processing between input data. However, the target data (e.g., image) and the kernel data as input data may include a plurality of zeros, and thus, it is not necessary to perform a multiplication operation in these cases.
For example, when at least one of the input data is zero in the multiplication operation between the input data, the multiplication result is zero. That is, if at least one of the input data is zero, the result is known to be zero even if no multiplication is performed. Therefore, the operation period can be shortened by omitting unnecessary multiplication operations expressed as processing data sparsity.
However, in the related art, a method has been developed only for dealing with data sparsity when a plurality of processing elements are implemented in the form of a one-dimensional array. Therefore, there is a need for a method of handling data sparsity when implementing multiple processing elements in a two-dimensional array.
Disclosure of Invention
Technical problem
The present disclosure has been made to solve the above-mentioned problems and disadvantages, and to provide at least the advantages described below.
Means for solving the problems
Accordingly, it is an aspect of the present disclosure to provide an electronic apparatus and a control method thereof that omit unnecessary operations in convolution operation processing to improve operation speed.
Another aspect of the present disclosure is to provide an electronic apparatus and a control method thereof that can improve a speed of a convolution operation by omitting an operation of partial target data and partial kernel data according to zeros included in the target data.
According to an aspect of the present disclosure, an electronic device is provided for performing deep learning. The electronic device includes: a memory configured to store target data and core data; and a processor configured to include a plurality of processing elements arranged in a matrix shape, and configured to: inputting a first non-zero element of a plurality of first elements included in the target data to each of the plurality of processing elements, and sequentially inputting a second non-zero element of the plurality of elements included in the core data to each of the plurality of first processing elements included in the first row of the plurality of processing elements, wherein each of the plurality of first processing elements is configured to perform an operation between the input first non-zero element and the input second non-zero element based on depth information of the first non-zero element and depth information of the second non-zero element.
According to another aspect of the present disclosure, a method is provided for controlling an electronic device to perform deep learning. The method comprises the following steps: inputting a first non-zero element of a plurality of first elements included in the target data to each of the plurality of processing elements; sequentially inputting a second non-zero element of a plurality of elements included in the core data to each of a plurality of first processing elements included in a first row of the plurality of processing elements; and performing an operation between the input first non-zero element and the input second non-zero element based on the depth information of the first non-zero element and the depth information of the second non-zero element.
Advantageous effects of the invention
According to various embodiments of the present disclosure as described above, an electronic device may increase the speed of convolution operations by omitting operations of partial target data and partial kernel data according to zeros included in the target data.
Drawings
The above and/or other aspects, features and advantages of particular embodiments of the present disclosure will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:
FIGS. 1A and 1B illustrate convolution operations between three-dimensional input data according to an embodiment;
FIG. 2 illustrates an electronic device according to an embodiment;
FIG. 3 illustrates a plurality of processing elements according to an embodiment;
4A-4D illustrate a method for inputting non-zero elements in target data and core data, according to an embodiment;
5A-5M illustrate an operation cycle of a processing element according to an embodiment;
fig. 6A and 6B illustrate a method for processing data sparsity of core data according to an embodiment.
Fig. 7A and 7B illustrate a method for processing data sparsity of target data according to an embodiment.
FIG. 8 illustrates a processing element according to an embodiment; and
fig. 9 is a flowchart illustrating a method of controlling an electronic device according to an embodiment.
Best Mode for Carrying Out The Invention
Detailed Description
Hereinafter, various embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood, however, that there is no intention to limit the disclosure to the specific forms disclosed herein; on the contrary, the disclosure is to be construed as covering various modifications, equivalents, and/or alternatives to the embodiments of the disclosure.
In describing the drawings, like reference numerals may be used to designate like constituent elements. A detailed description of known functions or configurations will be omitted for clarity and conciseness.
Fig. 1A and 1B illustrate convolution operations between three-dimensional input data according to an embodiment. The convolution operation is an operation performed with a very high weight in deep learning, in which the convolution operation emphasizes characteristics corresponding to kernel data from target data by an operation of the target data and the kernel data.
Referring to fig. 1A, the left side of fig. 1A shows an example of three-dimensional target data (feature map data), and the right side of fig. 1A shows an example of three-dimensional core data. For example, the target data is three-dimensional data including four rows and four columns and having a depth of five, and the core data is three-dimensional data including two rows and two columns and having a depth of five.
Referring to fig. 1B, there is shown output data according to the convolution operation of the target data and the kernel data of fig. 1A, which is two-dimensional data including three rows and three columns.
In the output data, Out11 may be calculated using equation (1).
Out11=F11,1×A,1+F11,2×A,2+F11,3×A,3+F11,4×A,4+F11,5×A,5+F12,1×B,1+F12,2×B,2+F12,3×B,3+F12,4×B,4+F12,5×B,5+F21,1×D,1+F21,2×D,2+F21,3×D,3+F21,4×D,4+F21,5×D,5+F22,1×C,1+F22,2×C,2+F22,3×C,3+F22,4×C,4+F22.5×C,5…(1)
In equation (1), the left side of the comma of F11,1 represents the row and column of the target data, and the right side of F11,1 represents the depth of the target data. For example, F21,3 indicates the second row, the first column, and the third depth of the target data, and the remaining target data are also displayed in the same manner. A, left comma of 1 indicates the row and column of the kernel data, and right comma indicates the depth of the kernel data. For example, D,4 represents the second row, first column, and fourth depth of the kernel data, with the remaining kernel data being displayed in the same manner. Hereinafter, the above symbols are used for ease of description.
The remainder of the output data may be calculated by operating on the same core data with other rows and columns of target data. For example, Out23 in the output data may be calculated by operating data included in all depths of F23, F24, F33, and F34 from the target data with the kernel data.
As described above, in order to perform a convolution operation between three-dimensional input data, the depths of the three-dimensional input data must be the same. Further, even if the input data is three-dimensional data, the output data can be changed to two-dimensional data.
In addition, fig. 1B shows a result of omitting the operation on the contour pixels of the target data, and when the operation on the contour pixels is added, another type of output data may be generated.
In the following description, for convenience of description, individual data (such as F11,1, F11,2, F11,3, F11,4, F11,5, F21,1,. gtoreq, F44,4, and F44,5) constituting target data is described as a first element; the individual data (such as a,1, a,2, a,3, a,4, B,1, · C,4, D,1, D,2, D,3, D,4) that make up the nuclear data is described as the second element. In addition, in the following drawings, reference directions of rows, columns, and depths shown in fig. 1A and 1B are the same.
Fig. 2 shows an electronic device according to an embodiment.
Referring to fig. 2, the electronic device 100 includes a memory 110 and a processor 120.
The electronic device 100 may perform deep learning, i.e., convolution operations. For example, the electronic device 100 may be a desktop Personal Computer (PC), a notebook, a smartphone, a tablet PC, a server, and so forth. Alternatively, the electronic apparatus 100 may be a system itself in which a cloud computing environment is built. However, the present disclosure is not limited thereto, and the electronic apparatus 100 may be any device capable of performing a convolution operation.
The memory 110 may store target data, core data, and the like. The target data and the core data may be stored to correspond to the type of the memory 110. For example, the memory 110 may include a plurality of two-dimensional cells, and the three-dimensional target data and the kernel data may be stored in the plurality of two-dimensional cells.
The processor 120 may identify data stored in the plurality of two-dimensional cells as three-dimensional target data and nuclear data. For example, the processor 120 may recognize data stored in the cells 1 to 25 among the plurality of cells as data of a first depth of the target data, and recognize data stored in the cells 26 to 50 among the plurality of cells as data of a second depth of the target data.
The core data may be generated by the electronic device 100 or may be generated and received by an external electronic device (i.e., not the electronic device 100). The target data may be information received from an external electronic device.
The memory 110 may be implemented as a hard disk, a non-volatile memory, a volatile memory, or the like.
The processor 120 generally controls the operation of the electronic device 100.
The processor 120 may be implemented as a Digital Signal Processor (DSP), a microprocessor, or a Time Controller (TCON), but is not limited thereto, and may include at least one of a Central Processing Unit (CPU), a Micro Controller Unit (MCU), a Micro Processing Unit (MPU), a controller, an Application Processor (AP), a Communication Processor (CP), and an ARM processor. The processor 120 may be implemented in the form of a system on chip (SoC), a Large Scale Integration (LSI) in which a processing algorithm is embedded, or a Field Programmable Gate Array (FPGA).
The processor 120 may include a plurality of processing elements arranged in a matrix form, and may control the operation of the plurality of processing elements.
FIG. 3 illustrates a plurality of processing elements according to an embodiment.
Referring to fig. 3, a plurality of Processing Elements (PEs) are arranged in a matrix form, and data may be shared between neighboring processing elements. Although fig. 3 illustrates transmitting data from the upper side to the lower side, the present disclosure is not limited thereto, and data may be transmitted from the lower side to the upper side.
Each of the plurality of processing elements includes a multiplier and an Arithmetic Logic Unit (ALU). The ALU may comprise at least one adder. Each of the plurality of processing elements may perform arithmetic operations using a multiplier and an ALU. Further, each of the plurality of processing elements may include a plurality of register files.
The processor 120 may input a first non-zero element of a plurality of first elements included in the target data to each of the plurality of processing elements. For example, processor 120 may identify a first non-zero element (i.e., an element that is not zero) from the target data stored in memory 100 and input the identified first non-zero element into the plurality of processing elements. That is, the processor 120 may extract only the first non-zero element from the target data stored in the memory 110 in real time.
Alternatively, before inputting the first non-zero element to the plurality of processing elements, the processor 120 may extract only the first non-zero element from the target data and store the first non-zero element in the memory 110. The memory 110 may store the target data and the extracted first non-zero element. Processor 120 may input the extracted first non-zero element directly into a plurality of processing elements. Processor 120 may identify a corresponding processing element among the plurality of processing elements based on the row information and the column information of the first non-zero element and input the first non-zero element to the identified processing element.
For example, if the first non-zero element is in a first row and a first column, processor 120 may be configured to input the first non-zero element to a first processing element of the plurality of processing elements, and if the first non-zero element is in a second row and a second column, the first non-zero element may be input to a second processing element of the plurality of processing elements. The first non-zero elements belonging to the first row and the first column may include a plurality of elements having different depths, and the processor 120 may input the plurality of first non-zero elements belonging to the first row and the first column to each of the plurality of register files of the first processing element.
For example, the processing elements may include a first register file corresponding to a first depth of the target data, a second register file corresponding to a second depth, …, and an nth register file corresponding to an nth depth, and the processor 120 may input an element of the first depth from among first non-zero elements belonging to the first row and the first column to the first register file included in the first processing element and input an element of the second depth to the second register file included in the first processing element. The second register file comprised in the first processing element may not store an element of the second depth if the element is not present in the first non-zero element belonging to the first row and the first column. However, the present disclosure is not limited thereto, and the processor 120 may sequentially input the first non-zero element into the plurality of register files included in the identified processing element without considering the depth information of the first non-zero element. For example, processor 120 may store the depth information of the first non-zero element stored in each register file along with the first non-zero element.
If the first non-zero element belonging to the first row and the first column is a first depth element, a third depth element, or a fourth depth element, the processor 120 may sequentially input the first non-zero element to the first register file, the second register file, and the third register file. Processor 120 may store a first non-zero element stored in the first register file as an element of a first depth, a first non-zero element stored in the second register file as an element of a third depth, and a first non-zero element stored in the third register file as an element of a fourth depth.
The processor 120 may sequentially input a second non-zero element of the plurality of second elements included in the core data to each of the plurality of first processing elements included in the first row of the plurality of processing elements.
The processor 120 may identify a second non-zero element from the core data stored in the memory 110 and sequentially input the identified second non-zero element to each of the plurality of first processing elements. That is, processor 120 may extract only the second non-zero elements from the core data stored in memory 110 in real time.
Here, the operation of sequentially inputting refers to an input order of elements among the plurality of second non-zero elements. For example, if there are a second non-zero element of the first depth, a second non-zero element of the second depth, and a second non-zero element of the third depth, the processor 120 may input the second non-zero element of the first depth to each of the plurality of first processing elements in the first period, input the second non-zero element of the second depth to each of the plurality of first processing elements in the second period, and input the second non-zero element of the third depth to each of the plurality of first processing elements in the third period.
Alternatively, the processor 120 may extract only the second non-zero elements from the core data and store the extracted second non-zero elements in the memory 110 before inputting the second non-zero elements to each of the plurality of first processing elements. In this case, the memory 110 may store the core data and the extracted second non-zero element. The processor 120 may sequentially input the extracted second non-zero elements to each of the plurality of first processing elements.
The plurality of first processing elements included in the first row of the plurality of processing elements may be a plurality of processing elements arranged at one corner of the plurality of processing element matrices. For example, the plurality of first processing elements may be four processing elements arranged at the top of fig. 3.
The processor 120 may sequentially input the second non-zero element to each of the plurality of first processing elements based on the row information, the column information, and the depth information of the second non-zero element. The processor 120 may sequentially input the second non-zero element to the plurality of first processing elements along with the depth information of the second non-zero element.
The processor 120 sequentially inputs the second non-zero elements included in one row and one column to each of the plurality of first processing elements based on the depth. When all of the second non-zero elements included in one row and one column are input to each of the plurality of first processing elements, second non-zero elements included in a row and a column different from the one row and the one column are input to each of the plurality of first processing elements.
For example, the processor 120 may sequentially input the second non-zero elements included in the first row and the second column to each of the plurality of first processing elements, and when the input of the second non-zero elements included in the first row and the first column is completed, the processor may sequentially input the second non-zero elements included in the first row and the second column to each of the plurality of first processing elements in order of depth.
In addition, when there is no second non-zero element in one row and one column, the processor 120 inputs a zero to each of the plurality of first processing elements, and when a zero is input to each of the plurality of first processing elements, the processor 120 may input a second non-zero element or a zero included in a different row or column to each of the plurality of first processing elements based on the number of second non-zero elements included in the different row and column.
When the operation between the elements corresponding to one row and one column is completed, the accumulated result is shifted, which is why zero is input.
When a depth without a first non-zero element in all rows and columns is identified from among the first non-zero elements stored in each of the plurality of processing elements, the processor 120 may omit input of a second non-zero element corresponding to the depth and sequentially input the second non-zero element not corresponding to the depth to each of the plurality of first processing elements.
For example, if there is no first non-zero element corresponding to the third depth among the first non-zero elements stored in each of the plurality of processing elements, the processor 120 may omit an input of a second non-zero element corresponding to the third depth among the second elements. More specifically, if the second non-zero element belonging to the first row and the first column is an element of a first depth, a third depth, or a fourth depth, the processor 120 may input an element of the first depth from among the second non-zero elements belonging to the first row and the first column to each of the plurality of processing elements, and if the period changes, the processor 120 may input an element of the fourth depth from among the second non-zero elements belonging to the first row and the first column to each of the plurality of first processing elements. That is, even if an element of a third depth among the second non-zero elements belonging to the first row and the first column is input to each of the plurality of first processing elements, the operation result is zero unless there is no first non-zero element corresponding to the third depth, and the processor 120 may shorten the period by not inputting an element of the third depth among the second non-zero elements belonging to the first row and the first column.
Optionally, processor 120 may also include a plurality of preliminary processing elements. When the depth has first non-zero elements within a predetermined number in all rows and columns among the first non-zero elements stored in each of the plurality of processing elements, the processor 120 may omit input of second non-zero elements corresponding to the depth, and sequentially input the second non-zero elements not corresponding to the depth to each of the plurality of first processing elements, and input the first non-zero elements corresponding to the depth and the second non-zero elements corresponding to the depth to the plurality of preliminary processing elements to perform an operation.
For example, among the first non-zero elements stored in each of the plurality of processing elements, if the first non-zero element corresponding to the third depth is less than five, the processor 120 may omit input of the second non-zero element corresponding to the third depth, and sequentially input the second non-zero element not corresponding to the third depth to each of the plurality of first processing elements, and input the first non-zero element corresponding to the third depth and the second non-zero element corresponding to the third depth to the plurality of preliminary processing elements to perform an operation.
Each of the plurality of first processing elements may perform an operation on the input first non-zero element and the input second object based on the depth information of the first non-zero element and the depth information of the second non-zero element.
The remaining processing elements of the plurality of processing elements may receive a second non-zero element from an adjacent processing element. Each of the remaining processing elements may perform an operation between the first non-zero element of the input and the second non-zero element of the input based on the depth information of the first non-zero element and the depth information of the second non-zero element.
The first non-zero element and the second non-zero element may be input to each of the plurality of processing elements on a cycle-by-cycle basis. In this case, each of the plurality of processing elements may perform an operation between the first non-zero element and the second non-zero element input by cycles based on the respective depth information.
Alternatively, the first non-zero element may be preliminarily input to the plurality of processing elements at a time, and the second non-zero element may be input to each of the plurality of processing elements at each cycle. In this case, each of the plurality of processing elements may perform an operation between a first non-zero element stored in advance and a second non-zero element input by a cycle based on the respective depth information.
When the operation between the non-zero elements of the plurality of first processing elements is completed, the processor 120 may control the plurality of processing elements to shift a second non-zero element input to the plurality of first processing elements to each of the plurality of second processing elements included in the second row. When operations between non-zero elements are completed in the plurality of second processing elements, the processor 120 may control the plurality of processing elements to shift the second non-zero element shifted to the plurality of second processing elements to each of a plurality of third processing elements included in a third row of the plurality of processing elements.
When the second non-zero element input to each of the plurality of processing elements is included in the same row and the same column as the second non-zero element used in the operation that was just performed before, the processor 120 may accumulate the operation result of the input second non-zero element with the previous operation result and store the accumulated operation result in one of the plurality of register files. Here, the plurality of register files may include a register file for accumulating and storing the plurality of register files storing the first non-zero element and the operation result.
When the second non-zero element input to each of the plurality of processing elements is not included in the same row and the same column as the second non-zero element used in the operation immediately before performed, the processor 120 may shift the operation result stored in one of the plurality of register files of the plurality of processing elements to an adjacent processing element, and store the operation result of the input second non-zero element into one of the plurality of register files by accumulating the operation result of the input second non-zero element to the shifted operation result.
By the above method, the processor 120 may reduce unnecessary operations between the target data and the core data.
Fig. 4A to 4D illustrate a method for inputting non-zero elements from target data and core data according to an embodiment.
Referring to fig. 4A, the left side of fig. 4A shows three-dimensional object data, and the right side of fig. 4A shows three-dimensional first kernel data and three-dimensional second kernel data.
Since the core data is sequentially input to the plurality of first processing elements, the plurality of core data can be easily operated.
In fig. 4A, a first arrow direction toward the upper right end indicates a depth direction, and a second arrow direction rotated in a clockwise direction indicates an operation order of the core data. When the operation of the core data corresponding to the depth of a is completed, the operation of the core data corresponding to the depth of B may be performed. That is, the order of operations may be A- > B- > C- > D for its entire depth.
Referring to fig. 4B, the upper left end of fig. 4B shows a first line in the target data, and the lower left end of fig. 4B shows a second line in the target data. The arrow direction indicates the depth direction as indicated by the first arrow direction in fig. 4A.
The numbers shown on the left side of fig. 4B represent indices of depth and the elements are not zero, while a depth without numbers represents an element of zero. For example, in the first row and the first column of the target data, elements of the first depth, the fourth depth, and the fifth depth are not zero, and elements of the second depth and the third depth are zero.
The right side of fig. 4B shows only the first non-zero element from the left side of fig. 4B. As shown on the left side of fig. 4B, processor 120 may identify a first non-zero element from the target data and input the identified first non-zero element into the plurality of processing elements. Alternatively, the processor 120 may extract only the first non-zero element as shown on the right side of fig. 4B, store the extracted first non-zero element separately in the memory 110, and extract the stored first non-zero element to input the element to the plurality of processing elements. In this case, as shown in fig. 4B, the processor 120 may first extract the first non-zero element in the depth direction of F11 of the first row and then move to the side to extract the first non-zero element in the depth direction of F12. The processor 120 may extract the first non-zero element in the depth direction of each of F13 and F14 in the same manner. Processor 120 may extract the first non-zero element of the second row in the same manner.
In fig. 4B, only the first and second rows are shown in the target data for convenience of description, and only the first and second rows of the target data will be described below for convenience of description. However, the operations for the remaining rows are the same as the operations for the first and second rows.
Referring to fig. 4C, the left side of fig. 4C shows the first core data and the second core data in rows and columns. The arrow direction in fig. 4C indicates the depth direction as shown by the first arrow direction in fig. 4A. The numbers shown on the left side of fig. 4C indicate that the index and element of the depth are not zero, while the depth without numbers indicates that the element is zero. For example, in the first row and the first column of the kernel data, the elements of the first depth and the third depth are not zero, while the elements of the second depth, the fourth depth, and the fifth depth are zero.
The right side of fig. 4C shows only the second non-zero element from the left side of fig. 4C. The processor 120 may identify a second non-zero element as shown on the left side of fig. 4C from the core data and sequentially input the identified second non-zero element into the plurality of first processing elements. Alternatively, the processor 120 may extract only the second non-zero elements as shown at the right side of fig. 4C, store the identified second non-zero elements separately in the memory 110, and extract the stored second non-zero elements to be sequentially input to the plurality of first processing elements. In this case, the processor 120 may first extract a first non-zero element in the depth direction as shown in fig. 4C, and then move to the side to extract a second non-zero element in the depth direction as shown in B in fig. 4C. The processor 120 may extract the second non-zero element in the depth direction of each of C and D in the same manner.
For example, as shown in FIG. 4D, processor 120 may include multiple processing elements in a 4 × 4 matrix. The four processing elements included in the first row 410 at the upper end of the plurality of processing elements are referred to as a plurality of first processing elements.
The processor 120 may input the first non-zero element included in the second row of the target data to four processing elements (hereinafter, referred to as a plurality of second processing elements) included in a row located below the first row 410. For example, the processor 120 may input elements of the first depth, the second depth, the third depth, and the fourth depth included in the second row and the first column of the target data to a processing element, which is first from the left side, among the plurality of second processing elements, input elements of the fourth depth and the fifth depth included in the second row and the second column of the target data to a processing element, which is second from the left side, among the plurality of second processing elements, input elements of the third depth included in the second row and the third column of the target data to a processing element, which is third from the left side, among the plurality of second processing elements, and input elements of the second depth, the third depth, the fourth depth, and the fifth depth included in the second row and the fourth column of the target data to a processing element, which is fourth from the left side, among the plurality of second processing elements.
The processor 120 may sequentially input the second non-zero elements included in the first row and the first column of the first core data to the plurality of first processing elements in order of depth.
The processor 120 may sequentially input second non-zero elements included in the first row and the first column of the first core data to the plurality of first processing elements, sequentially input second non-zero elements included in the first row and the second column of the first core data to the plurality of first processing elements, sequentially input second non-zero elements included in the second row and the second column of the first core data to the plurality of first processing elements, and sequentially input second non-zero elements included in the second row and the first column of the first core data to the plurality of first processing elements.
The processor 120 may sequentially input the second non-zero elements included in the first core data to the plurality of first processing elements, and sequentially input the second non-zero elements included in the second core data to the plurality of first processing elements.
For example, the processor 120 may sequentially input elements of a first depth and a third depth included in the first row and the first column of the first kernel data to the plurality of first processing elements, sequentially input elements of the first depth, the second depth, the third depth, the fourth depth, and the fifth depth included in the first row and the second column of the first kernel data to the plurality of first processing elements, and sequentially input elements of the first depth, the second depth, the third depth, and the fifth depth included in the second row and the second column to the plurality of first processing elements. If the second non-zero element is not included in the second row and the first column of the first core data, the processor 120 may input a zero to the plurality of first processing elements. In addition, the processor 120 may sequentially input the second non-zero elements of the second core data to the plurality of first processing elements, and the input order may be the same as the first core data.
Each of the plurality of first processing elements may shift a second non-zero element of the input to an adjacent second processing element of the plurality of second processing elements when the period changes. Each of the plurality of second processing elements may shift a second non-zero element of the input to a processing element adjacent in a downward direction.
Alternatively, the processor 120 may input all of the first non-zero elements and first non-zero elements corresponding to the plurality of first processing elements and input a first one of the second non-zero elements to the plurality of first processing elements in the first period. Thereafter, the processor 120 may input a first non-zero element corresponding to the plurality of second processing elements and input a second non-zero element to the plurality of second processing elements in the second cycle. That is, processor 120 may periodically input a portion of the first non-zero elements to the plurality of first processing elements.
Fig. 5A to 5M illustrate operations of a processing element performed by a cycle according to an embodiment. For convenience of description, fig. 5A to 5M will be described with reference to the plurality of first processing elements and the plurality of second processing elements in fig. 4A to 4D. Specifically, fig. 5A to 5M show a plurality of first processing elements on the upper side and a plurality of second processing elements on the lower side. Further, in each processing element, the left side represents a first non-zero element, the middle represents a second non-zero element, and the right side represents the processing result.
Referring to fig. 5A, the upper left side of fig. 5A illustrates one of the plurality of first processing elements, and the left side 510 represents first non-zero elements of a first depth, a fourth depth, and a fifth depth included in a first row and a first column in the target data, the middle element 520 represents a second non-zero element of the first depth included in the first row and the first column in the first core data, and the right side 530 represents an operation result. However, the description of the specific operation result value is omitted in the right side 530.
As shown in fig. 5A, processor 120 may input a first non-zero element into a first plurality of processing elements and a plurality of second processing elements in a first cycle. However, the present disclosure is not limited thereto, and the processor 120 may input the first non-zero elements to the plurality of first processing elements in the first period, and input the first non-zero elements to the plurality of second processing elements in the second period. The processor 120 may input a first non-zero element corresponding to each processing element, and further description will be omitted.
Each of the plurality of first processing elements may perform an operation between the input first non-zero element and the input second non-zero element and store an operation result based on the input first non-zero element depth information and the input second non-zero element depth information. For example, the second non-zero element of the input is an element of a first depth, and thus, the first processing element, the third processing element, and the fourth processing element from the left side, which store the first non-zero element of the first depth, may perform an operation between the first non-zero element and the second non-zero element. In the plurality of first processing elements, a second processing element from the left side that does not store the first non-zero element of the first depth does not perform an operation between the first non-zero element and the second non-zero element. The result of the operation is stored in each processing element and is not shifted to adjacent processing elements.
The plurality of second processing elements do not perform operations because the second non-zero element is not input.
Referring to fig. 5B, processor 120 may input a second non-zero element to the plurality of first processing elements. Here, the input second non-zero element is a second non-zero element of a third depth included in the first row and the first column of the first kernel data.
Each of the plurality of first processing elements may shift a second non-zero element to an adjacent second processing element in a first cycle.
Each of the plurality of first processing elements may perform an inter-element operation between a first non-zero element of the input and a second non-zero element of the input. Each of the plurality of first processing elements may shift the operation result to an adjacent processing element by adding the operation result of the second cycle to the operation result of the first cycle. The reason for the shift is that all second non-zero elements included in the first row and the first column are input into the first core data. That is, the second non-zero element input in the second cycle is the last second non-zero element included in the first row and the first column of the first core data.
The shift direction is determined according to the row and column in the next cycle in which the element is located in the first kernel data. In the third period, a second non-zero element of the first depth included in the first row and the second column of the first core data is to be input, and the second non-zero element is on the right side of the first row and the first column of the first core data. That is, the shift direction may be to the right. If a second non-zero element of the first depth included in the second row and the first column is to be input in the third period, this is at the lower side of the first row and the first column of the first kernel data, and the shift direction may be toward the lower side.
Each of the plurality of second processing elements may perform an inter-element operation between a first non-zero element and a second non-zero element input by the same operation method as the operation of the plurality of first processing elements in the previous cycle.
As shown in fig. 5C, processor 120 may input a second non-zero element to the plurality of first processing elements in a third cycle. Here, the input second non-zero element is a second non-zero element of the first depth included in the first row and the second column of the first core data.
Each of the plurality of first processing elements may shift a second non-zero element input in the second cycle into an adjacent second processing element. In addition, each of the plurality of second processing elements may shift a second non-zero element input in the second cycle to a processing element (not shown) adjacent to the lower side of the input in the second cycle.
In other words, the plurality of first processing elements and the plurality of second processing elements may be shifted in a previous cycle, and when the cycle changes, this element may be shifted to a lower processing element together with the second non-zero element of the input. Since the same operation is repeated, a description of the shift of the second non-zero element will be omitted.
Each of the plurality of first processing elements may perform an inter-element operation on a first non-zero element of the input and a second non-zero element of the input. Each of the plurality of first processing elements may add the operation result shifted from the second cycle and the operation result of the third cycle and store the added operation result.
Each of the plurality of second processing elements may perform an inter-element operation between the input first non-zero element and the input second non-zero element input in the same operation method as the operation of the plurality of first processing elements in the previous cycle, and shift the operation result to the right.
That is, each of the plurality of second processing elements may operate in the same manner as the operation of the plurality of first processing elements in the previous cycle. Hereinafter, unless otherwise specified, the operations of the plurality of second processing elements are the same as those of the plurality of first processing elements in the previous cycle.
Each of fig. 5D, 5E, and 5F illustrates an operation according to an input of a second non-zero element of a second depth, a third depth, and a fourth depth included in the first row and the second column of the first core data. The operation is the same as the above operation, and thus a detailed description is omitted.
Referring to fig. 5G, processor 120 may input a second non-zero element to the plurality of first processing elements in a seventh cycle. Here, the input second non-zero element is a second non-zero element of a fifth depth included in the first row and the second column of the first kernel data.
Each of the plurality of first processing elements may perform an inter-element operation on a first non-zero element of the input and a second non-zero element of the input. Each of the plurality of first processing elements may add and shift the operation result of the seventh cycle and the operation result of the sixth cycle to an adjacent second processing element.
As described above, in the next cycle, the second non-zero element of the first depth included in the second row and the second column of the first core data (where the second non-zero element corresponds to the lower side of the first row and the second column of the first core data) is to be input, and the shift direction may be downward. Each of the plurality of second processing elements may perform an inter-element operation between a first non-zero element of the input and a second non-zero element of the input.
Each of the plurality of second processing elements may store the operation result shifted from the adjacent first processing element separately from the operation result in the seventh cycle. That is, the operation result shifted from the processing element adjacent to the upper side in the downward direction is not added to the operation result of the current cycle.
Referring to fig. 5H, processor 120 may input a second non-zero element to the plurality of first processing elements in an eighth cycle. The second non-zero element of the input is a second non-zero element of the first depth included in the second row and the second column of the first kernel data.
Each of the plurality of first processing elements may perform an inter-element operation between a first non-zero element of the input and a second non-zero element of the input.
Each of the plurality of second processing elements may perform an inter-element operation on the first non-zero element of the input and the second non-zero element of the input. Each of the plurality of second processing elements may add the operation result in the seventh period to the operation result in the eighth period, and shift the added operation result to a processing element adjacent to the lower side. However, the operation result shifted from the processing element adjacent to the upper side in the seventh cycle may be stored in each of the plurality of second processing elements as it is.
Referring to fig. 5I, processor 120 may input a second non-zero element to the plurality of first processing elements in a ninth cycle. The second non-zero element of the input is a second non-zero element of a second depth included in a second row and a second column of the first kernel data.
Each of the plurality of first processing elements performs an inter-element operation between an input first non-zero element and an input second non-zero element, and stores an added operation result by adding an operation result of a previous cycle and an operation result of a current cycle.
Each of the plurality of second processing elements performs an inter-element operation between the input first non-zero element and the input second non-zero element, adds an operation result shifted from a processing element adjacent to the upper side in the seventh cycle to an operation result of the current cycle, and stores the added operation result.
Fig. 5J and 5K illustrate operations according to inputs of second non-zero elements of a third depth and a fifth depth included in the second row and the second column of the first core data. As described above, the operation method, the addition method, and the shift method are the same, and thus, detailed description is omitted.
However, as shown in fig. 5K, the added operation result may be shifted to the left. That is, the shift direction of the addition result of fig. 5K may be opposite to the shift direction of the addition result of fig. 5B.
Referring to fig. 5L, the processor 120 may input zeros to the plurality of first processing elements in a twelfth cycle. Because there is no second non-zero element in the second row and the first column of the first core data, the processor 120 may input zeros to the plurality of first processing elements.
In fig. 5L, since the second non-zero element input in the next cycle is the second non-zero element of the second core data, no shift is required. However, if the second non-zero element to be input in the next cycle is the same second non-zero element of the first core data, the shift is performed. In this case, the processor 120 inputs zeros to the plurality of first processing elements, and may shift an operation result stored in each of the plurality of first processing elements to an adjacent processing element.
Referring to fig. 5M, processor 120 may input a second non-zero element to the plurality of first processing elements in a thirteenth cycle. The second non-zero element of the input is a second non-zero element of a second depth included in the first row and the first column of the second kernel data. The operations of the plurality of first processing elements and the plurality of second processing elements are the same as those described above.
By using the above-described method illustrated in fig. 5A to 5M, a continuous convolution operation can be performed on a plurality of kernel data. Here, the processor 120 may output an operation result of the first core data.
Although fig. 5A to 5M illustrate a plurality of processing elements in the form of a 4 × 4 matrix, the present disclosure is not limited thereto, and the number of processing elements may vary.
In addition, although the target data has been described in the form of 4 × 4 × 5, the target data is not limited thereto, and the target data may be in any other form. For example, when the target data is in the form of 16 × 16 × 5 and a plurality of processing elements in the form of a 4 × 4 matrix are used, the processor 120 may divide the target data into four parts based on rows and columns of the target data and may perform a convolution operation.
Fig. 6A and 6B illustrate a method of processing data sparsity of core data according to an embodiment.
If processor 120 identifies a depth for which there is no first non-zero element in all rows and columns of first non-zero elements stored in each of the plurality of processing elements, the processor may omit input of a second non-zero element of the second elements corresponding to the depth and sequentially input the second non-zero element not corresponding to the depth into each of the plurality of first processing elements.
For example, as shown in fig. 6A, processor 120 may identify that there is no first non-zero element corresponding to the second depth among the first non-zero elements stored in each of the plurality of processing elements. In this case, the processor 120 may remove the second non-zero elements of the second depth included in the first core data and the second core data and sequentially input the remaining second non-zero elements to the plurality of first processing elements.
The processor 120 may remove second non-zero elements of the second depth included in the first core data and the second core data, separately store the remaining second non-zero elements in the memory 110, and sequentially extract the remaining second non-zero elements to be input to the plurality of first processing elements. Alternatively, the processor 120 may sequentially extract a second non-zero element from the first kernel data and the second kernel data, and when the second non-zero element of the second depth is identified, the second non-zero element will be skipped, and a second non-zero element that is not the second depth may be extracted and input to the plurality of first processing elements.
Alternatively, as shown in fig. 6B, processor 120 may identify a depth at which there is no first non-zero element in all rows and all columns before inputting the first non-zero element to each of the plurality of processing elements.
Fig. 7A and 7B illustrate a method for processing data sparsity of target data according to an embodiment.
If a depth of the first non-zero element within the predetermined number in all rows and columns is identified from the first non-zero elements stored in each of the plurality of processing elements, the processor 120 may omit an input of a second non-zero element of the second elements corresponding to the identified depth and sequentially input the second non-zero element not corresponding to the depth to each of the plurality of first processing elements.
For example, as shown in fig. 7A, when a second depth having less than three first non-zero elements in all rows and columns among the first non-zero elements stored in each of the plurality of processing elements is identified, the processor 120 may omit input of a second non-zero element 720 corresponding to the second depth among the second elements, and may sequentially input a second non-zero element not corresponding to the second depth to each of the plurality of first processing elements.
In this case, the first non-zero element of the identified depth may be stored in a portion of the plurality of processing elements, but no operation is performed unless the second non-zero element 720 of the identified depth is input, and thus, the cycle may be shortened. The shortened period is the same as the period shown in fig. 6A and 6B.
The processor 120 may further include a plurality of preliminary processing elements, and may input a first non-zero element corresponding to the identified depth and a second non-zero element corresponding to the identified depth to the plurality of preliminary processing elements to perform separate operations.
For example, as shown in fig. 7B, the processor 120 may further include a plurality of preliminary processing elements 730, and the first non-zero element 710 corresponding to the identified depth and the second non-zero element 720 corresponding to the identified depth may be input to the plurality of preliminary processing elements 730 to perform separate operations.
In other words, processor 120 may use multiple processing elements to perform the operations shown in fig. 5A-5M and use multiple preliminary processing elements 730 to operate in parallel on first non-zero elements 710 corresponding to the identified depth and second non-zero elements 720 corresponding to the identified depth.
Thereafter, the processor 120 may add the operation results output from the plurality of preliminary processing elements 730 to the corresponding operation results among the operation results output from the plurality of processing elements.
FIG. 8 illustrates a processing element according to an embodiment.
Referring to fig. 8, the processing element includes a core terminal 811, a feature map terminal 812, a PSum terminal 813, a bottom Acc terminal 814, a left Acc terminal 821, a right Acc terminal 822, a Ctrl _ Inst terminal 823, a left Acc terminal 831, a right Acc terminal 832, a core terminal 841, a PSum terminal 842, a bottom Acc terminal 843, a register file 850, a multiplier 860, a multiplexer 870, and an adder 880.
The processing element may receive the second non-zero element, the first non-zero element, and data and instructions stored in memory 110 through each of core terminal 811, feature map terminal 812, Psum terminal 813, and Ctrl _ Inst terminal 823. In addition, a processing element may shift a second non-zero element to a lower adjacent processing element via core terminal 841. In particular, the processing elements may receive or output data directly to the memory 110 using the PSum terminals 813 and 842.
The processing elements can receive the operation result from the adjacent processing elements through the bottom Acc terminal 814, the right Acc terminal 822 and the left Acc terminal 831. In addition, the processing element can shift the operation result of the direct processing to the adjacent processing element through the left Acc terminal 821, the right Acc terminal 832 and the bottom Acc terminal 843.
The multiplexer 870 may provide one of the following: the operation result input from the adjacent processing element, the operation result processed in the processing element, the data input from PSum terminal 813, and the data input from register file 850 to adder 8810.
The adder 880 may perform an addition operation of the multiplication result input from the multiplier 860 and the data input from the multiplexer 870.
The processing element may also include a multiplexer.
Fig. 9 is a flowchart illustrating a method of controlling an electronic device according to an embodiment. For example, the electronic device may include a processor that performs deep learning, a memory that stores target data and core data, and a plurality of processing elements arranged in a matrix form.
Referring to fig. 9, in step S910, a first non-zero element of a plurality of first elements included in target data is input to each of a plurality of processing elements.
In step S920, a second non-zero element of the plurality of second elements included in the core data is sequentially input to each of the plurality of first processing elements included in the first row of the plurality of processing elements.
In step S930, an operation between the input first non-zero element and the input second non-zero element is performed based on the input depth information of the first non-zero element and the input depth information of the second non-zero element input from each of the plurality of first processing elements.
Each of the plurality of processing elements includes a plurality of register files, and inputting the first non-zero element at step S910 may include identifying a corresponding processing element from the plurality of processing elements based on row information and column information of the first non-zero element, and inputting the first non-zero element to a corresponding register file from the plurality of register files included in the identified processing element.
The step S920 of sequentially inputting the second non-zero elements may include: the second non-zero elements are sequentially input to the plurality of first processing elements based on their row information, column information, and depth information.
The step S920 of sequentially inputting the second non-zero elements may include: sequentially inputting a second non-zero element included in one row and one column of second non-zero elements to each of the plurality of first processing elements based on the depth, and if all of the second non-zero elements included in one row and one column are input to each of the plurality of processing elements, inputting a second non-zero element included in a row and a column different from the one row and the one column to each of the plurality of first processing elements.
In addition, the step S920 of sequentially inputting the second non-zero elements includes: when a second non-zero element is not present in one row and one column, inputting a zero to each of the plurality of first processing elements, and if a zero is input to each of the plurality of processing elements, inputting a second non-zero element or a zero included in another row and another column to each of the plurality of first processing elements based on the number of the second non-zero elements included in the another row and the another column.
The step S920 of sequentially inputting the second non-zero elements may include: when a depth at which there is no first non-zero element in all rows and columns is identified from among the first non-zero elements stored in each of the plurality of processing elements, the input of a second non-zero element corresponding to the depth among the plurality of second elements is omitted, and the second non-zero element not corresponding to the depth is sequentially input to each of the first plurality of first processing elements.
In addition, the step S920 of sequentially inputting the second non-zero elements includes: when a depth of a first non-zero element within a predetermined number in all rows and columns is identified from among first non-zero elements stored in each of a plurality of processing elements, an input of a second non-zero element corresponding to the depth among a plurality of second elements is omitted, the second non-zero element not corresponding to the depth is sequentially input to each of the plurality of first processing elements, and the first non-zero element corresponding to the depth and the second non-zero element corresponding to the depth are input to a plurality of preliminary processing elements included in processing.
When the operations between the elements are completed in the plurality of first processing elements, the second non-zero element of the input may be shifted to each of the plurality of second processing elements included in the second row. The shifted second non-zero elements may be shifted from the plurality of second processing elements to each of a plurality of third processing elements included in the third row if operations between the non-zero elements are completed in the plurality of second processing elements.
When a second non-zero element input to each of the plurality of processing elements belongs to the same row and the same column as the second non-zero element, the input second non-zero element may be accumulated with a previous operation result, and the accumulated result may be stored to one of the plurality of register files.
If the second non-zero element input to each of the plurality of processing elements does not belong to the same row and the same column as the second non-zero element that was just used for the operation, the operation result stored in one of the plurality of register files of each of the plurality of processing elements may be shifted to an adjacent processing element, and the input second non-zero element may be accumulated with the shifted operation result and then stored in one of the plurality of register files.
According to various embodiments of the present disclosure as described above, an electronic device may increase the speed of convolution operations by omitting a portion of target data and a portion of kernel data from zeros included in the target data.
The target data and the nuclear data described above may be any form of three-dimensional data. In addition, the number of processing elements included in a processor may also vary.
In accordance with embodiments of the present disclosure, the various embodiments described above may be implemented using software including instructions stored on a machine-readable storage medium readable by a machine (e.g., a computer). The apparatus calls instructions stored by the storage medium and may operate according to the called instructions, and may include an electronic device (e.g., an electronic appliance). When the instructions are executed by the processor, the processor may perform the functions corresponding to the instructions directly or using other components under the control of the processor. The instructions may include code generated or executed by a compiler or interpreter.
The machine-readable storage medium may be provided in the form of a non-transitory storage medium.
According to embodiments of the present disclosure, the methods according to various embodiments described above may be provided in a computer program product. The computer program product may be traded as a commodity between a seller and a buyer. The computer program product may be distributed in the form of a machine-readable storage medium (e.g., compact disc read only memory (CD-ROM)), or may be distributed through an application Store (e.g., Play Store)TM) The computer program product is distributed online. For online publishing, at least a part of the computer program product may be temporarily or at least temporarily stored on a storage medium (such as a manufacturer's server, application)A server of a store or a memory of a relay server).
Furthermore, the various embodiments described above may be implemented in a computer-readable medium, such as a computer or similar device, using software, hardware, or a combination thereof. In some cases, the embodiments described herein may be implemented by the processor itself. According to a software implementation, embodiments such as the processes and functions described herein may be implemented in separate software modules. Each software module may perform one or more of the functions and operations described herein.
Computer instructions for performing processing operations of the device according to the various embodiments described above may be stored in a non-transitory computer readable medium. Computer instructions stored in the non-transitory computer readable medium, when executed by a processor of a particular apparatus, cause the particular apparatus to perform processing operations on an apparatus according to the various embodiments described above. A non-transitory computer-readable medium is a medium (such as a register, cache, memory, etc.) that stores data for a short time, but stores data semi-permanently, and is readable by a device. Specific examples of non-transitory computer readable media include CDs, DVDs, hard disks, blu-ray discs, USB, memory cards, ROMs, and the like.
Further, each component (e.g., module or program) according to the various embodiments described above may include one or more entities, and some of the sub-components described above may be omitted. These components may also be included in various embodiments. Alternatively or additionally, some components (e.g., modules or programs) may be integrated into one entity to perform the same or similar functions performed by each respective component prior to integration. Operations performed by a module, program, or other component may be performed in a sequential, parallel, iterative, or heuristic manner, or at least some of the operations may be performed in a different order, in accordance with various embodiments.
While the disclosure has been shown and described with reference to certain embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.
Claims (15)
1. An electronic device for performing deep learning, the electronic device comprising:
a memory configured to store target data and core data; and
a processor configured to include a plurality of processing elements arranged in a matrix shape, wherein the processor is configured to:
inputting a first non-zero element of a plurality of first elements included in the target data to each of the plurality of processing elements, and
sequentially inputting a second non-zero element of a plurality of elements included in the core data to each of a plurality of first processing elements included in a first row of the plurality of processing elements,
wherein each of the plurality of first processing elements is configured to perform an operation between the input first non-zero element and the input second non-zero element based on the depth information of the first non-zero element and the depth information of the second non-zero element.
2. The electronic device of claim 1, wherein each of the plurality of processing elements comprises a plurality of register files, and
wherein the processor is further configured to:
identifying a corresponding processing element from the plurality of processing elements based on the row information and the column information of the first non-zero element, and
based on the depth information of the first non-zero element, the first non-zero element is input to a corresponding register file of the plurality of register files included in the identified processing element.
3. The electronic device of claim 2, wherein the processor is further configured to sequentially input a second non-zero element to each of the plurality of first processing elements based on row information, column information, and depth information of the second non-zero element.
4. The electronic device of claim 3, wherein the processor is further configured to:
sequentially inputting a second non-zero element included in one row and one column among the second non-zero elements to each of the plurality of first processing elements based on the depth, and
when all second non-zero elements included in the one row and the one column are input to each of the plurality of first processing elements, second non-zero elements included in a row and a column different from the one row and the one column are input to each of the plurality of first processing elements.
5. The electronic device of claim 4, wherein the processor is further configured to:
when a second non-zero element is not present in the one row and the one column, inputting a zero to each of the plurality of first processing elements, and
when zero is input to each of the plurality of first processing elements, second non-zero elements included in different rows and columns are input to each of the plurality of first processing elements based on the number of second non-zero elements included in different rows and columns.
6. The electronic device of claim 3, wherein the processor is further configured to: when a depth of no first non-zero element in all rows and columns of first non-zero elements stored in each of the plurality of processing elements is identified, input of a second non-zero element of the second elements corresponding to the depth is omitted, and the second non-zero element not corresponding to the depth is sequentially input to each of the plurality of first processing elements.
7. The electronic device of claim 3, wherein the processor further comprises a plurality of preliminary processing elements, and
wherein the processor is further configured to:
when the depth within a predetermined number of non-zero elements in all rows and columns corresponding to the depth is identified from among the first non-zero elements stored in each of the plurality of processing elements, the input of the second non-zero element corresponding to the depth is omitted, and the second non-zero elements not corresponding to the depth are sequentially input to each of the plurality of first processing elements, and
a first non-zero element corresponding to the depth and a second non-zero element corresponding to the depth are input to a plurality of preliminary processing elements to perform an operation.
8. The electronic device of claim 3, wherein the processor is further configured to:
controlling the plurality of processing elements to shift a second non-zero element input to the plurality of first processing elements to each of a plurality of second processing elements included in a second row when an operation between non-zero elements in the plurality of first processing elements is completed, and
controlling the plurality of processing elements to shift a second non-zero element shifted to the plurality of second processing elements to each of a plurality of third processing elements included in a third row of the plurality of processing elements when an operation between non-zero elements is completed in the plurality of second processing elements.
9. The electronic device of claim 8, wherein the processor is further configured to: when a second non-zero element input to each of the plurality of processing elements belongs to the same row and the same column as a second non-zero element used immediately before, accumulating an operation result of the input second non-zero element with a previous operation result, and storing the accumulated operation result in one of the plurality of register files.
10. The electronic device of claim 8, wherein the processor is further configured to: shifting an operation result stored in one of the plurality of register files of each of the plurality of processing elements to an adjacent processing element when a second non-zero element input to each of the plurality of processing elements does not belong to the same row and the same column as a second non-zero element immediately before used for operation, and accumulating the operation result of the input second non-zero element with the shifted operation result and storing the accumulated operation result in one of the plurality of register files.
11. A method of controlling an electronic device to perform deep learning, wherein the electronic device comprises a processor, wherein the processor comprises a plurality of processing elements arranged in a matrix shape, the method comprising:
inputting a first non-zero element of a plurality of first elements included in the target data to each of the plurality of processing elements;
sequentially inputting a second non-zero element of a plurality of elements included in the core data to each of a plurality of first processing elements included in a first row of the plurality of processing elements; and is
And performing an operation between the input first non-zero element and the input second non-zero element based on the depth information of the first non-zero element and the depth information of the second non-zero element.
12. The method of claim 11, wherein each of the plurality of processing elements comprises a plurality of register files, and
wherein the step of inputting the first non-zero element comprises:
identifying a corresponding processing element from the plurality of processing elements based on row information and column information of a first non-zero element; and is
Based on the depth information of the first non-zero element, the first non-zero element is input to a corresponding register file of the plurality of register files included in the identified processing element.
13. The method of claim 12, wherein the step of sequentially inputting the second non-zero elements comprises: sequentially inputting a second non-zero element to each of the plurality of first processing elements based on row information, column information, and depth information of the second non-zero element.
14. The method of claim 13, wherein the step of sequentially inputting the second non-zero elements comprises:
sequentially inputting second non-zero elements included in one row and one column among the second non-zero elements to each of the plurality of first processing elements based on the depth; and is
When all second non-zero elements included in the one row and the one column are input to each of the plurality of first processing elements, second non-zero elements included in a row and a column different from the one row and the one column are input to each of the plurality of first processing elements.
15. The method of claim 14, wherein the step of sequentially inputting the second non-zero elements comprises:
inputting a zero to each of the plurality of first processing elements when a second non-zero element is not present in the one row and the one column; and is
When zero is input to each of the plurality of first processing elements, second non-zero elements included in different rows and columns are input to each of the plurality of first processing elements based on the number of second non-zero elements included in different rows and columns.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762571599P | 2017-10-12 | 2017-10-12 | |
US62/571,599 | 2017-10-12 | ||
KR10-2018-0022960 | 2018-02-26 | ||
KR1020180022960A KR102704647B1 (en) | 2017-10-12 | 2018-02-26 | Electronic apparatus and control method thereof |
PCT/KR2018/006509 WO2019074185A1 (en) | 2017-10-12 | 2018-06-08 | Electronic apparatus and control method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111095304A true CN111095304A (en) | 2020-05-01 |
Family
ID=66282988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880057625.8A Pending CN111095304A (en) | 2017-10-12 | 2018-06-08 | Electronic device and control method thereof |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3659073A4 (en) |
KR (1) | KR102704647B1 (en) |
CN (1) | CN111095304A (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102697300B1 (en) * | 2018-03-07 | 2024-08-23 | 삼성전자주식회사 | Electronic apparatus and control method thereof |
KR20210111014A (en) * | 2020-03-02 | 2021-09-10 | 삼성전자주식회사 | Electronic apparatus and method for controlling thereof |
KR102565826B1 (en) * | 2020-12-29 | 2023-08-16 | 한양대학교 산학협력단 | 3D object recognition method and apparatus that improves the speed of convolution operation through data reuse |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1923793A2 (en) * | 2006-10-03 | 2008-05-21 | Sparsix Corporation | Memory controller for sparse data computation system and method therefor |
EP2657842A1 (en) * | 2012-04-23 | 2013-10-30 | Fujitsu Limited | Workload optimization in a multi-processor system executing sparse-matrix vector multiplication |
US20140298351A1 (en) * | 2013-03-29 | 2014-10-02 | Fujitsu Limited | Parallel operation method and information processing apparatus |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
KR20170052432A (en) * | 2015-10-30 | 2017-05-12 | 세종대학교산학협력단 | Calcuating method and apparatus to skip operation with respect to operator having value of zero as operand |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
-
2018
- 2018-02-26 KR KR1020180022960A patent/KR102704647B1/en active IP Right Grant
- 2018-06-08 CN CN201880057625.8A patent/CN111095304A/en active Pending
- 2018-06-08 EP EP18866233.2A patent/EP3659073A4/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1923793A2 (en) * | 2006-10-03 | 2008-05-21 | Sparsix Corporation | Memory controller for sparse data computation system and method therefor |
EP2657842A1 (en) * | 2012-04-23 | 2013-10-30 | Fujitsu Limited | Workload optimization in a multi-processor system executing sparse-matrix vector multiplication |
US20140298351A1 (en) * | 2013-03-29 | 2014-10-02 | Fujitsu Limited | Parallel operation method and information processing apparatus |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
KR20170052432A (en) * | 2015-10-30 | 2017-05-12 | 세종대학교산학협력단 | Calcuating method and apparatus to skip operation with respect to operator having value of zero as operand |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
Non-Patent Citations (8)
Title |
---|
BAOYUAN LIU; MIN WANG; HASSAN FOROOSH; MARSHALL TAPPEN; MARIANNA PENKSY: "Sparse Convolutional Neural Networks", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 15 October 2015 (2015-10-15), pages 1 - 9 * |
DONGYOUNG KIM; JUNWHAN AHN; SUNGJOO YOO: "A novel zero weight/activation-aware hardware architecture of convolutional neural network" * |
DONGYOUNG KIM; JUNWHAN AHN; SUNGJOO YOO: "A novel zero weight/activation-aware hardware architecture of convolutional neural network", 《DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017》, pages 1462 - 1467 * |
JORGE ALBERICIO; PATRICK JUDD; TAYLER HETHERINGTON; TOR AAMODT; NATALIE ENRIGHT JERGER; ANDREAS MOSHOVOS: "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing", 《2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA)》, 25 August 2016 (2016-08-25), pages 1 - 13 * |
LEONID YAVITS, RAN GINOSAR: "Sparse Matrix Multiplication on CAM Based Accelerator" * |
LEONID YAVITS, RAN GINOSAR: "Sparse Matrix Multiplication on CAM Based Accelerator", 《ARXIV》, pages 1 - 5 * |
YU-HSIN CHEN; TUSHAR KRISHNA; JOEL S. EMER; VIVIENNE SZE: "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks" * |
YU-HSIN CHEN; TUSHAR KRISHNA; JOEL S. EMER; VIVIENNE SZE: "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", 《IEEE JOURNAL OF SOLID-STATE CIRCUITS》, pages 127 - 137 * |
Also Published As
Publication number | Publication date |
---|---|
KR102704647B1 (en) | 2024-09-10 |
EP3659073A4 (en) | 2020-09-30 |
EP3659073A1 (en) | 2020-06-03 |
KR20190041388A (en) | 2019-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11551280B2 (en) | Method, manufacture, and system for recommending items to users | |
JP6961011B2 (en) | Systems and methods for data management | |
CN106097353B (en) | Method for segmenting objects and device, computing device based on the fusion of multi-level regional area | |
US20220414439A1 (en) | Neuromorphic Synthesizer | |
CN112002309A (en) | Model training method and apparatus | |
EP3528181B1 (en) | Processing method of neural network and apparatus using the processing method | |
CN111382859A (en) | Method and apparatus for processing convolution operations in a neural network | |
US11636712B2 (en) | Dynamic gesture recognition method, device and computer-readable storage medium | |
CN113469354B (en) | Memory-constrained neural network training | |
JP6822581B2 (en) | Information processing equipment, information processing methods and programs | |
CN111819581B (en) | Electronic apparatus and control method thereof | |
CN111095304A (en) | Electronic device and control method thereof | |
CN113939801A (en) | Reducing the computational load of neural networks using self-correcting codes | |
CN109284782A (en) | Method and apparatus for detecting feature | |
CN112668689A (en) | Method and apparatus for multimedia data processing | |
US20190114542A1 (en) | Electronic apparatus and control method thereof | |
Sikka | Elements of Deep Learning for Computer Vision: Explore Deep Neural Network Architectures, PyTorch, Object Detection Algorithms, and Computer Vision Applications for Python Coders (English Edition) | |
US20210174179A1 (en) | Arithmetic apparatus, operating method thereof, and neural network processor | |
KR20220134035A (en) | Processing-in-memory method for convolution operations | |
CN114595811A (en) | Method and apparatus for performing deep learning operations | |
KR20220161339A (en) | Feature reordering based on similarity for improved memory compression transfer in machine learning tasks | |
CN105117330B (en) | CNN code test methods and device | |
CN111931841A (en) | Deep learning-based tree processing method, terminal, chip and storage medium | |
CN112733536A (en) | Word embedding method and device and word searching method | |
CN111507456A (en) | Method and apparatus with convolutional neural network processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |