WO2024158174A1

WO2024158174A1 - Electronic device and method for quantization of operator related to computation of model

Info

Publication number: WO2024158174A1
Application number: PCT/KR2024/000928
Authority: WO
Inventors: 이은택; 김정배; 이현수; 김무영; 나보연
Original assignee: 삼성전자주식회사
Priority date: 2023-01-25
Filing date: 2024-01-19
Publication date: 2024-08-02

Abstract

According to one embodiment, an electronic device may comprise a memory and at least one processor. The at least one processor can obtain, from the memory, a first weight set of a first data type included in a first operator, which is one of at least one operator included in the model. The at least one processor can obtain a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. The at least one processor can obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set on the basis of the set of sub-output data. The at least one processor can store the second weight set in the memory on the basis of obtaining the second weight set. Various other embodiments are possible.

Description

Electronic device and method for quantization of operators involved in the calculation of a model

The descriptions below relate to electronic devices and methods for quantization of operators related to model operations.

Recently, applications incorporating artificial intelligence (AI) technology are being developed. Applications can use weights of various data types depending on the required precision. A method of allowing applications using the weights of data types corresponding to high precision to be calculated in an accelerator that supports data types corresponding to low precision is being studied.

The above information may be provided as background art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing can be applied as prior art to the present disclosure.

According to one embodiment, an electronic device may include memory and at least one processor. The at least one processor may obtain, from the memory, a first weight set of a first data type included in a first operator, which is one of at least one operator included in the model. The at least one processor generates a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. It can be obtained. The at least one processor may obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set based on the set of sub-output data. . The at least one processor may store the second weight set in the memory based on obtaining the second weight set.

According to one embodiment, a method performed by an electronic device includes obtaining, in a memory, a first set of weights of a first data type included in a first operator, one of at least one operator included in the model. Can include actions. The method includes an operation of obtaining a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. It can be included. The method may include, based on the set of sub-output data, obtaining a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set. . The method may include storing the second set of weights in the memory based on obtaining the second set of weights.

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs, when executed by a processor of an electronic device, are stored in memory, at least one of the programs included in the model. and instructions causing the electronic device to obtain a first set of weights of a first data type included in a first operator, one of the operators. The at least one program obtains a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model. It may include instructions that cause the electronic device to do so. The electronic device allows the at least one program to obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set based on the set of sub-output data. It may contain instructions that cause . The at least one or more programs may include instructions that cause the electronic device to store the second set of weights in the memory based on obtaining the second set of weights.

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs are included in the model in the memory when executed by a processor of an electronic device. and instructions that cause the electronic device to obtain a first set of weights of a first data type included in a first operator, one of at least one operator. The one or more programs obtain, from profile information generated by execution of the model and stored in the memory, a set of sub-output data corresponding to a set of input data used for computation of the first operator. It may include instructions that cause the electronic device to do so. The one or more programs, based on the set of sub-output data, perform quantization on the first set of weights to obtain a second set of weights of a second data type supported by at least one accelerator. It may contain instructions that cause . The one or more programs may include instructions that cause the electronic device to store the second set of weights in the memory based on obtaining the second set of weights.

As described above, according to one embodiment, an electronic device may include a memory that stores instructions, and at least one processor. The instructions, when executed by the at least one processor, cause the electronic device to generate, in the memory, a first set of weights of a first data type included in a first operator that is one of at least one operator included in the model. Obtain, from profile information generated by execution of the model and stored in the memory, a set of sub-output data corresponding to a set of input data used for computation of the first operator, Based on the set of sub-output data, performing quantization on the first weight set to obtain a second weight set of a second data type supported by at least one accelerator, obtaining the second weight set Based on this, it may cause the second weight set to be stored in the memory.

1 is a block diagram of an electronic device in a network environment, according to embodiments.

FIG. 2 is a block diagram illustrating one or more processors included in an electronic device according to an embodiment.

FIG. 3 is an exemplary diagram illustrating a neural network running on an electronic device according to an embodiment.

Figure 4 is an example diagram for explaining quantization of weights according to an embodiment.

Figure 5 is a block diagram for explaining the operation of a memory included in an electronic device according to an embodiment.

FIG. 6 is a block diagram illustrating reliability evaluation of a model executed in an electronic device according to an embodiment.

FIG. 7 illustrates a flow of operations of an electronic device for storing quantized second weight sets according to an embodiment.

Figure 8 shows a flow of operations for executing quantization using a server according to an embodiment.

FIG. 9 illustrates a flow of operations of an electronic device for acquiring a second set of weights through quantization according to an embodiment.

FIG. 10 illustrates a flow of operations of an electronic device for storing a second set of weights based on reliability evaluation according to an embodiment.

FIG. 11 illustrates a flow of operations of an electronic device for identifying a quantization method according to reliability, according to an embodiment.

Terms used in the present disclosure are merely used to describe specific embodiments and may not be intended to limit the scope of other embodiments. Singular expressions may include plural expressions, unless the context clearly dictates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as commonly understood by a person of ordinary skill in the technical field described in this disclosure. Among the terms used in this disclosure, terms defined in general dictionaries may be interpreted to have the same or similar meaning as the meaning they have in the context of related technology, and unless clearly defined in this disclosure, have an ideal or excessively formal meaning. It is not interpreted as In some cases, even terms defined in the present disclosure cannot be interpreted to exclude embodiments of the present disclosure.

In various embodiments of the present disclosure described below, a hardware approach method is explained as an example. However, since various embodiments of the present disclosure include technology using both hardware and software, the various embodiments of the present disclosure do not exclude software-based approaches.

Terms used in the following description refer to signals (e.g., signal, information, message, signaling), terms for operational states (e.g., step, operation, procedure), and terms that refer to data. Terms referring to artificial intelligence (AI) (e.g. packet, user stream, information, bit, symbol, codeword), terms referring to artificial intelligence (AI) (e.g. neural network) network, artificial neural network, neural network model, model, term referring to weight (e.g. weight), term referring to profile information, operator ) (e.g., layer), terms referring to network entities, terms referring to components of a device, etc. are exemplified for convenience of explanation. Accordingly, the present disclosure is not limited to the terms described below, and other terms having equivalent technical meaning may be used.

In addition, terms such as '... part', '... base', '... water', and '... body' used hereinafter mean at least one shape structure or a unit that processes a function. It can mean.

In addition, in the present disclosure, the expressions greater than or less than may be used to determine whether a specific condition is satisfied or fulfilled, but this is only a description for expressing an example, and the description of more or less may be used. It's not exclusion. Conditions written as ‘more than’ can be replaced with ‘more than’, conditions written as ‘less than’ can be replaced with ‘less than’, and conditions written as ‘more than and less than’ can be replaced with ‘greater than and less than’. In addition, hereinafter, 'A' to 'B' means at least one of the elements from A to (including A) and B (including B). Hereinafter, 'C' and/or 'D' means including at least one of 'C' or 'D', for example {'C', 'D', 'C' and 'D'} .

1 is a block diagram of an electronic device 101 in a network environment 100, according to various embodiments.

Referring to FIG. 1, in the network environment 100, the electronic device 101 communicates with the electronic device 102 through a first network 198 (e.g., a short-range wireless communication network) or a second network 199. It is possible to communicate with at least one of the electronic device 104 or the server 108 through (e.g., a long-distance wireless communication network). According to one embodiment, the electronic device 101 may communicate with the electronic device 104 through the server 108. According to one embodiment, the electronic device 101 includes a processor 120, a memory 130, an input module 150, an audio output module 155, a display module 160, an audio module 170, and a sensor module ( 176), interface 177, connection terminal 178, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196 , or may include an antenna module 197. In some embodiments, at least one of these components (eg, the connection terminal 178) may be omitted or one or more other components may be added to the electronic device 101. In some embodiments, some of these components (e.g., sensor module 176, camera module 180, or antenna module 197) are integrated into one component (e.g., display module 160). It can be.

Processor 120 may, for example, execute software (e.g., program 140) to operate at least one other component (e.g., hardware or software component) of electronic device 101 connected to processor 120. It can be controlled and various data processing or operations can be performed. According to one embodiment, as at least part of data processing or computation, the processor 120 stores commands or data received from another component (e.g., sensor module 176 or communication module 190) in volatile memory 132. The commands or data stored in the volatile memory 132 can be processed, and the resulting data can be stored in the non-volatile memory 134. According to one embodiment, the processor 120 includes a main processor 121 (e.g., a central processing unit or an application processor) or an auxiliary processor 123 that can operate independently or together (e.g., a graphics processing unit, a neural network processing unit ( It may include a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device 101 includes a main processor 121 and a auxiliary processor 123, the auxiliary processor 123 may be set to use lower power than the main processor 121 or be specialized for a designated function. You can. The auxiliary processor 123 may be implemented separately from the main processor 121 or as part of it.

The auxiliary processor 123 may, for example, act on behalf of the main processor 121 while the main processor 121 is in an inactive (e.g., sleep) state, or while the main processor 121 is in an active (e.g., application execution) state. ), together with the main processor 121, at least one of the components of the electronic device 101 (e.g., the display module 160, the sensor module 176, or the communication module 190) At least some of the functions or states related to can be controlled. According to one embodiment, coprocessor 123 (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module 180 or communication module 190). there is. According to one embodiment, the auxiliary processor 123 (eg, neural network processing device) may include a hardware structure specialized for processing artificial intelligence models. Artificial intelligence models can be created through machine learning. For example, such learning may be performed in the electronic device 101 itself on which the artificial intelligence model is performed, or may be performed through a separate server (e.g., server 108). Learning algorithms may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but It is not limited. An artificial intelligence model may include multiple artificial neural network layers. Artificial neural networks include deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), belief deep network (DBN), bidirectional recurrent deep neural network (BRDNN), It may be one of deep Q-networks or a combination of two or more of the above, but is not limited to the examples described above. In addition to hardware structures, artificial intelligence models may additionally or alternatively include software structures.

The memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176) of the electronic device 101. Data may include, for example, input data or output data for software (e.g., program 140) and instructions related thereto. Memory 130 may include volatile memory 132 or non-volatile memory 134.

The program 140 may be stored as software in the memory 130 and may include, for example, an operating system 142, middleware 144, or application 146.

The input module 150 may receive commands or data to be used in a component of the electronic device 101 (e.g., the processor 120) from outside the electronic device 101 (e.g., a user). The input module 150 may include, for example, a microphone, mouse, keyboard, keys (eg, buttons), or digital pen (eg, stylus pen).

The sound output module 155 may output sound signals to the outside of the electronic device 101. The sound output module 155 may include, for example, a speaker or a receiver. Speakers can be used for general purposes such as multimedia playback or recording playback. The receiver can be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.

The display module 160 can visually provide information to the outside of the electronic device 101 (eg, a user). The display module 160 may include, for example, a display, a hologram device, or a projector, and a control circuit for controlling the device. According to one embodiment, the display module 160 may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of force generated by the touch.

The audio module 170 can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module 170 acquires sound through the input module 150, the sound output module 155, or an external electronic device (e.g., directly or wirelessly connected to the electronic device 101). Sound may be output through the electronic device 102 (e.g., speaker or headphone).

The sensor module 176 detects the operating state (e.g., power or temperature) of the electronic device 101 or the external environmental state (e.g., user state) and generates an electrical signal or data value corresponding to the detected state. can do. According to one embodiment, the sensor module 176 includes, for example, a gesture sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, humidity sensor, or light sensor.

The interface 177 may support one or more designated protocols that can be used to connect the electronic device 101 directly or wirelessly with an external electronic device (eg, the electronic device 102). According to one embodiment, the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

The connection terminal 178 may include a connector through which the electronic device 101 can be physically connected to an external electronic device (eg, the electronic device 102). According to one embodiment, the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).

The haptic module 179 may convert electrical signals into mechanical stimulation (e.g., vibration or movement) or electrical stimulation that the user can perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.

The camera module 180 can capture still images and moving images. According to one embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.

The power management module 188 can manage power supplied to the electronic device 101. According to one embodiment, the power management module 188 may be implemented as at least a part of, for example, a power management integrated circuit (PMIC).

The battery 189 may supply power to at least one component of the electronic device 101. According to one embodiment, the battery 189 may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

Communication module 190 is configured to provide a direct (e.g., wired) communication channel or wireless communication channel between electronic device 101 and an external electronic device (e.g., electronic device 102, electronic device 104, or server 108). It can support establishment and communication through established communication channels. Communication module 190 operates independently of processor 120 (e.g., an application processor) and may include one or more communication processors that support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module 190 may be a wireless communication module 192 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (e.g., : LAN (local area network) communication module, or power line communication module) may be included. Among these communication modules, the corresponding communication module is a first network 198 (e.g., a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 199 (e.g., legacy It may communicate with an external electronic device 104 through a telecommunication network such as a cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or WAN). These various types of communication modules may be integrated into one component (e.g., a single chip) or may be implemented as a plurality of separate components (e.g., multiple chips). The wireless communication module 192 uses subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 to communicate within a communication network such as the first network 198 or the second network 199. The electronic device 101 can be confirmed or authenticated.

The wireless communication module 192 may support 5G networks after 4G networks and next-generation communication technologies, for example, NR access technology (new radio access technology). NR access technology provides high-speed transmission of high-capacity data (eMBB (enhanced mobile broadband)), minimization of terminal power and access to multiple terminals (mMTC (massive machine type communications)), or high reliability and low latency (URLLC (ultra-reliable and low latency). -latency communications)) can be supported. The wireless communication module 192 may support a high frequency band (eg, mmWave band), for example, to achieve a high data rate. The wireless communication module 192 uses various technologies to secure performance in high frequency bands, for example, beamforming, massive array multiple-input and multiple-output (MIMO), and full-dimensional multiplexing. It can support technologies such as input/output (FD-MIMO: full dimensional MIMO), array antenna, analog beam-forming, or large scale antenna. The wireless communication module 192 may support various requirements specified in the electronic device 101, an external electronic device (e.g., electronic device 104), or a network system (e.g., second network 199). According to one embodiment, the wireless communication module 192 supports peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mmTC, or U-plane latency (e.g., 164 dB or less) for realizing URLLC. Example: Downlink (DL) and uplink (UL) each of 0.5 ms or less, or round trip 1 ms or less) can be supported.

The antenna module 197 may transmit or receive signals or power to or from the outside (eg, an external electronic device). According to one embodiment, the antenna module 197 may include an antenna including a radiator made of a conductor or a conductive pattern formed on a substrate (eg, PCB). According to one embodiment, the antenna module 197 may include a plurality of antennas (eg, an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is, for example, connected to the plurality of antennas by the communication module 190. can be selected Signals or power may be transmitted or received between the communication module 190 and an external electronic device through the at least one selected antenna. According to some embodiments, in addition to the radiator, other components (eg, radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module 197.

According to various embodiments, the antenna module 197 may form a mmWave antenna module. According to one embodiment, a mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first side (e.g., bottom side) of the printed circuit board and capable of supporting a designated high-frequency band (e.g., mmWave band); And a plurality of antennas (e.g., array antennas) disposed on or adjacent to the second side (e.g., top or side) of the printed circuit board and capable of transmitting or receiving signals in the designated high frequency band. can do.

At least some of the components are connected to each other through a communication method between peripheral devices (e.g., bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)) and signal ( (e.g. commands or data) can be exchanged with each other.

According to one embodiment, commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199. Each of the external

electronic devices

102 or 104 may be of the same or different type as the electronic device 101. According to one embodiment, all or part of the operations performed in the electronic device 101 may be executed in one or more of the external

electronic devices

102, 104, or 108. For example, when the electronic device 101 must perform a function or service automatically or in response to a request from a user or another device, the electronic device 101 may perform the function or service instead of executing the function or service on its own. Alternatively, or additionally, one or more external electronic devices may be requested to perform at least part of the function or service. One or more external electronic devices that have received the request may execute at least part of the requested function or service, or an additional function or service related to the request, and transmit the result of the execution to the electronic device 101. The electronic device 101 may process the result as is or additionally and provide it as at least part of a response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology can be used. The electronic device 101 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device 104 may include an Internet of Things (IoT) device. Server 108 may be an intelligent server using machine learning and/or neural networks. According to one embodiment, the external electronic device 104 or server 108 may be included in the second network 199. The electronic device 101 may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

Electronic devices according to various embodiments disclosed in this document may be of various types. Electronic devices may include, for example, portable communication devices (e.g., smartphones), computer devices, portable multimedia devices, portable medical devices, cameras, wearable devices, or home appliances. Electronic devices according to embodiments of this document are not limited to the above-described devices.

The various embodiments of this document and the terms used herein are not intended to limit the technical features described in this document to specific embodiments, and should be understood to include various changes, equivalents, or replacements of the embodiments. In connection with the description of the drawings, similar reference numbers may be used for similar or related components. The singular form of a noun corresponding to an item may include one or more of the items, unless the relevant context clearly indicates otherwise. As used herein, “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof. Terms such as "first", "second", or "first" or "second" may be used simply to distinguish one element from another, and may be used to distinguish such elements in other respects, such as importance or order) is not limited. One (e.g. first) component is said to be "coupled" or "connected" to another (e.g. second) component, with or without the terms "functionally" or "communicatively". Where mentioned, it means that any of the components can be connected to the other components directly (e.g. wired), wirelessly, or through a third component.

The term “module” used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as logic, logic block, component, or circuit, for example. It can be used as A module may be an integrated part or a minimum unit of the parts or a part thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).

Various embodiments of the present document are one or more instructions stored in a storage medium (e.g., built-in memory 136 or external memory 138) that can be read by a machine (e.g., electronic device 101). It may be implemented as software (e.g., program 140) including these. For example, a processor (e.g., processor 120) of a device (e.g., electronic device 101) may call at least one command among one or more commands stored from a storage medium and execute it. This allows the device to be operated to perform at least one function according to the at least one instruction called. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' only means that the storage medium is a tangible device and does not contain signals (e.g. electromagnetic waves). This term refers to cases where data is stored semi-permanently in the storage medium. There is no distinction between temporary storage cases.

According to one embodiment, methods according to various embodiments disclosed in this document may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. The computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play Store™) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online. In the case of online distribution, at least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

According to various embodiments, each component (e.g., module or program) of the above-described components may include a single or plural entity, and some of the plurality of entities may be separately placed in other components. there is. According to various embodiments, one or more of the components or operations described above may be omitted, or one or more other components or operations may be added. Alternatively or additionally, multiple components (eg, modules or programs) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component of the plurality of components prior to the integration. . According to various embodiments, operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, or omitted. Alternatively, one or more other operations may be added.

FIG. 2 is a block diagram illustrating one or more processors included in the electronic device 101 according to an embodiment. The electronic device 101 of FIG. 2 may correspond to the electronic device 101 of FIG. 1 .

Referring to FIG. 2 , the electronic device 101 according to an embodiment may include at least one of a CPU 200, an NPU 210, a GPU 220, or a memory 130. The CPU 200, the NPU 210, the GPU 220, and the memory 130 are electrically and/or connected to each other by an electronic component such as a communication bus 230. Can be operably coupled (electronically and/or operably coupled with each other). The type and/or number of hardware components included in the electronic device 101 are not limited to those shown in FIG. 2. For example, the electronic device 101 includes the display module 160 and the communication module of FIG. 1. (190) may be further included.

The CPU 200 of the electronic device 101 according to one embodiment may include hardware components for processing data based on one or more instructions. Hardware components for processing data may include, for example, an Arithmetic and Logic Unit (ALU), a Floating Point Unit (FPU), and/or a Field Programmable Gate Array (FPGA). The FPU may be a module for efficiently processing floating point operations. The ALU may be a module for efficiently processing integer operations. The CPU 200 may have the structure of a multi-core processor such as dual core, quad core, or hexa core. The CPU 200 of FIG. 2 may correspond to an example of the processor 120 and/or the main processor 121 of FIG. 1.

The GPU 220 of the electronic device 101 according to an embodiment may include one or more pipelines that perform a plurality of operations required to execute instructions related to computer graphics. For example, the pipeline of the GPU 220 is a graphics pipeline or rendering pipeline for generating a 3D image and generating a 2D raster image from the generated 3D image. may include. The graphics pipeline is included in a file stored in the memory 130 and can be controlled based on code written in a shading language. For example, code written in a shading language may be compiled by the CPU 200 into instructions executable on the GPU 220.

The NPU 210 of the electronic device 101 according to one embodiment may include hardware components to support one or more functions based on a neural network. The neural network is a recognition model implemented in software or hardware that imitates the computational ability of a biological system using a large number of artificial neurons (or nodes). In the present disclosure, a neural network may be referred to as a model. For example, the electronic device 101 according to one embodiment may execute functions similar to human cognitive functions or learning processes based on a neural network. In one embodiment, one or more functions based on the neural network supported by the NPU 210 include the function of training a neural network, image recognition, voice recognition, and/or handwriting recognition using a trained neural network. It may include a function to perform, a function personalized to the user of the electronic device 101 based on a neural network, and a function to control a neural network based on an application using an API (Application Programming Interface).

The CPU 200, NPU 210, and GPU 220 of FIG. 2 are each included as different integrated circuits in the electronic device 101, or are based on a System on Chip (SoC). may be included in a single integrated circuit (single IC). For example, the CPU 200, the NPU 210, the GPU 220, or a combination thereof may be included in a single integrated circuit included in the electronic device 101. The type of processing unit included based on the SoC is not limited to the above example, and for example, other hardware components (e.g., communication processor) not shown in FIG. 2 may be included in the CPU 200 and the NPU. 210 and the GPU 220 may be included in a single integrated circuit.

The memory 130 of the electronic device 101 according to an embodiment stores data and/or instructions input and/or output to the CPU 200, the NPU 210, and/or the GPU 220. May include hardware components for storage. The memory 130 may include, for example, volatile memory 132 such as RAM (Random-Access Memory) and/or non-volatile memory such as ROM (Read-Only Memory). It may include (134). The volatile memory 132 may include, for example, at least one of Dynamic RAM (DRAM), Static RAM (SRAM), Cache RAM, and Pseudo SRAM (PSRAM). The non-volatile memory 134 is, for example, at least one of PROM (Programmable ROM), EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), flash memory, hard disk, compact disk, and eMMC (Embedded Multi Media Card). It can contain one. The memory 130, the volatile memory 132, and the non-volatile memory 134 of FIG. 2 may correspond to the memory 130, the volatile memory 132, and the non-volatile memory 134 of FIG. 1, respectively. .

In the memory 130 of the electronic device 101 according to one embodiment, a set of parameters for calculating a neural network may be stored. Parameters representing the neural network may, for example, represent a plurality of nodes included in the neural network and/or weights assigned to connections between the plurality of nodes. The structure of a neural network represented by a set of parameters stored in the memory 130 of the electronic device 101 according to one embodiment will be described later with reference to FIG. 3. Based on a request for a neural network operation, at least one of the CPU 200, GPU 220, or NPU 210 may perform an operation based on the set of parameters.

Referring to FIG. 2, the NPU 210 may include a neural engine, buffer, and/or controller. Although not shown, the neural engine, the buffer, and the controller may be electrically and/or operatively connected to each other by electronic devices such as a communication bus. According to one embodiment, the neural engine and/or controller may be implemented in software. According to one embodiment, the neural engine and/or controller may be implemented in hardware.

The NPU 210 according to one embodiment may perform operations required to execute network-related functions.

The NPU 210 according to one embodiment may at least temporarily store one or more numerical values used in the calculation or one or more output numerical values in order to perform the calculation.

The controller of the NPU 210 according to one embodiment may control operations based on the neural engine included in the NPU 210. According to one embodiment, the controller may be a software module. According to one embodiment, the controller may be a hardware module.

The electronic device 101 according to one embodiment may identify a specific numerical value from one or more bits based on a plurality of data types. The data type may be a predetermined category for interpretation of one or more bits by the electronic device 101. For example, the electronic device 101 may interpret a set of one or more bits based on a data type corresponding to the set and identify data represented by the set. For example, when the electronic device 101 stores one or more bits representing a specific numerical value in the memory 130, the number of bits corresponding to the specific numerical value within the memory 130 varies depending on the data type. may be differentiated.

The NPU 210 according to one embodiment may support binary arithmetic operations for each of a plurality of data types. As the NPU 210 supports binary arithmetic operations of multiple data types, the buffer can be managed more efficiently. Hereinafter, multiple data types may have different precisions.

As described above, the NPU 210 according to one embodiment may support performance of arithmetic operations based on a designated data type. For example, if the data type included in the neural network stored in the memory 130 and the data type supported by the NPU 210 are different, the NPU 210 may not support operations of the different data type. there is. According to one embodiment, the electronic device 101 is capable of performing neural network operations such as the NPU 210, a graphic processing unit (GPU), a tensor processing unit (TPU), and/or a digital signal processor (DSP). If the precision (or data type) supported by the module is different from the precision (or data type) of the parameters (e.g. weights and/or biases) provided for the operation of the neural network, quantization ), the precision of the parameters can be changed using a type conversion algorithm such as ). For example, the electronic device 101 inputs the parameters to the CPU 200, performs an operation related to the neural network, and changes the precision of the parameters using the results of the operation. You can. In the present disclosure, the precision refers to quantization level, resolution, bit depth, digital level, and representation density in terms of representing the unit of each representation value within the quantized range. (representation density), quantum density, quantum range, indication capability, indication density, indication level. and/or may be referred to by equivalent technical terms.

Quantization may refer to the operation of converting a real number variable into an integer variable. For example, the CPU 200 may convert a variable of a data type representing a floating point number based on 16 bits into a variable of a data type representing an integer based on 8 bits. For example, the CPU 200 may convert a variable of a data type representing a floating point number based on 8 bits to a variable of a data type representing an integer based on 4 bits. Through the quantization, the amount of computation of the neural network model can be reduced and the size of the neural network model can be reduced.

Methods for performing the quantization can be divided into symmetric methods and asymmetric methods. When quantized through the symmetric method, the values of weights and/or biases included in the model can be proportionally converted. When quantized using the asymmetric method, the values of weights and/or biases included in the model may change non-proportionally. For example, based on an asymmetric method, when parameters (e.g., weights and/or biases) of a first data type for expressing floating point numbers are quantized into a second data type for expressing integers, the first The length of the first section of the first data type matching the first integer of the second data type and the length of the second section of the first data type matching the second integer of the second data type may be different from each other. . The method of performing quantization may be determined depending on the quantization method supported by the processor. When quantization is possible in an asymmetric manner in the processor, quantization can be performed in an asymmetric manner. The processor may perform quantization through a quantizer. The quantizer may include an application and/or an application programming interface (API) (eg, runtime library) provided for model calculation. The embodiment is not limited to this, and the quantizer may include an accelerator and/or a combination of applications (or instructions) for controlling the accelerator. According to one embodiment, the quantizer may include a software application for quantizing a set of weights and a bias. The accelerator may support operations for operators to execute models. According to one embodiment, the model may include a machine learning model. According to one embodiment, the accelerator may be a hardware component (eg, a processor, a component included in the processor, or a component separate from the processor) and/or software within the processor. Regarding the type of the accelerator, the information described below with respect to FIG. 2 may be referred to. The present invention can perform quantization in units of individual operators constituting a neural network model. Quantization operations for individual operators are described in Figure 4. The quantized operator can be stored along with the hash value of the operator before quantization. When performing an operation on a quantized neural network model, the electronic device may perform an operation on the operator before quantization based on the quantized weight and bias. The quantized weights and biases may be identified based on the stored hash values. After quantization is completed, the electronic device may perform a reliability evaluation on the quantized neural network model and replace the model before quantization with a neural network model that has passed the reliability evaluation. The operation for the reliability evaluation is described in FIG. 6. If the neural network model does not pass the reliability evaluation, the neural network model before quantization may be quantized again in the server. The server may transmit sub-output data from various electronic devices (e.g., terminals) and perform quantization on a neural network model before quantization through the transmitted data. The operation for quantization through the server is described in FIG. 8.

Below, before describing embodiments of the present disclosure, terms necessary to describe operations of an electronic device according to the embodiments are defined.

According to one embodiment, the quantizer may be a software application that quantizes the first weight and bias of the first data type to generate the second weight and bias of the second data type. The processor 120 may perform quantization through a quantizer. The quantizer may include an application and/or an application programming interface (API) (eg, runtime library) provided for model calculation. The embodiment is not limited to this, and the quantizer may include an accelerator and/or a combination of applications (or instructions) for controlling the accelerator.

According to one embodiment, an accelerator may be a hardware component or a software application to support operations for an operator. The accelerator may support operations for operators to execute models. According to one embodiment, the model may include a machine learning model. According to one embodiment, the accelerator may include hardware components within the processor 120. For example, the accelerator may include a cortex matrix engine (CME) within a central processing unit (CPU). For example, the accelerator may include a tensor processing unit (TPU) within a graphic processing unit (GPU). According to one embodiment, the accelerator may include software executed by the processor 120. For example, the accelerator may be a software application executed by at least one of the CPU, neural processing unit (NPU), or GPU. According to one embodiment, the accelerator may be a processor 120 capable of executing the model. For example, the accelerator may include a CPU. For example, the accelerator may include an NPU. For example, the accelerator may include a GPU. For example, the accelerator may include a TPU. Below in Figure 3, an example of a neural network according to one embodiment is described.

FIG. 3 is an exemplary diagram illustrating a neural network 300 running on an electronic device according to an embodiment. The electronic device of FIG. 3 may correspond to an example of the electronic device 101 of FIG. 1 and/or FIG. 2 . The neural network 300 of FIG. 3 may be obtained, for example, from a set of parameters stored in a memory (e.g., memory 130 of FIGS. 1 and/or 2) by an electronic device according to one embodiment. .

Referring to FIG. 3, the neural network 300 may include a plurality of layers. For example, the neural network 300 may include an input layer 310, one or more hidden layers 320, and an output layer 330. The input layer 310 may correspond to a vector and/or matrix representing input data of the neural network 300. For example, a vector representing the input data may have elements corresponding to the number of nodes included in the input layer 310. For example, elements included in the matrix representing the input data may correspond to each of the nodes included in the input layer 310. Signals generated by the input data at each node in the input layer 310 may be transmitted from the input layer 310 to the hidden layers 320. The output layer 330 may generate output data of the neural network 300 based on one or more signals received from the hidden layers 320. For example, the output data may correspond to a vector and/or matrix having elements corresponding to the number of nodes included in the output layer 330.

In one embodiment, the first nodes included in a specific layer among the plurality of layers included in the neural network 300 are at least the second nodes of the previous layer of the specific layer within the sequence of the plurality of layers. It can correspond to a single weighted sum. The electronic device 101 according to one embodiment may identify a weight to be applied to at least one of the second nodes from a set of parameters stored in the memory. Training the neural network 300 may include changing and/or determining one or more weights related to the weighted sum.

Referring to FIG. 3, one or more hidden layers 320 may be located between the input layer 310 and the output layer 330, and predict input data transmitted through the input layer 310. It can be easily converted to a value. The input layer 310, the one or more hidden layers 320, and the output layer 330 may include a plurality of nodes. One or more hidden layers 320 may be a convolutional filter or a fully connected layer in a convolutional neural network (CNN), or various types of filters or layers grouped based on special functions or characteristics. You can. In one embodiment, the one or more hidden layers 320 may be a layer based on a recurrent neural network (RNN) whose output value is re-input to the hidden layer at the current time. The neural network 300 according to one embodiment may include numerous hidden layers 320 to form a deep neural network. Training a deep neural network is called deep learning. Among the nodes of the neural network 300, nodes included in the hidden layers 320 are referred to as hidden nodes.

Nodes included in the input layer 310 and the one or more hidden layers 320 may be connected to each other through connection lines with connection weights, and nodes included in the hidden layer and the output layer 330 may also be connected. They can be connected to each other through connection lines with weights. Tuning and/or training the neural network 300 includes layers included in the neural network 300 (e.g., an input layer 310, one or more hidden layers 320, and an output layer 330). This may mean changing the connection weight between nodes included in each. Tuning of the neural network 300 may be performed based on, for example, supervised learning and/or unsupervised learning.

The electronic device according to one embodiment may tune the neural network 300 based on reinforcement learning in unsupervised learning. For example, the electronic device may change policy information used by the neural network 300 to control the agent based on the interaction between the agent and the environment. The electronic device according to one embodiment may cause a change in the policy information by the neural network 300 in order to maximize the agent's goal and/or reward due to the interaction.

For example, in a state of acquiring the neural network 300, the electronic device according to one embodiment may include the input layer 310, the one or more hidden layers 320, and/or the output layer ( 330), the weight corresponding to the connecting line between them can be identified. In order to obtain output data from the neural network 300 based on the identified weight, the electronic device uses a plurality of layers (e.g., the input layer 310, the one or more hidden layers) of the neural network 300. A weighted sum based on the connection line may be obtained sequentially along the layers 320 and the output layer 330. The obtained weighted sum may be stored in the NPU 210 and/or memory 130 of FIG. 2. For example, the electronic device may repeatedly update the weighted sum stored in the memory by sequentially obtaining the weighted sum along the plurality of layers.

Each of the plurality of layers of the neural network 300 may have an independent data type and/or precision. For example, when connection lines between a first layer and a second layer among the plurality of layers have weights based on a first data type to represent a floating point number, the electronic device may From the numerical values corresponding to the nodes of the first layer and the weights, weighted sums based on the first data type can be obtained. In the above example, when the connecting lines between the second layer and the third layer among the plurality of layers have weights based on the second data type to represent an integer number, the electronic device includes the obtained weighted sums and From weights based on the second data type, weighted sums based on the second data type can be obtained.

When a plurality of layers have different data types, the electronic device according to an embodiment may, for example, use the NPU 210 of FIG. 2 to provide information to each of the plurality of layers based on the different data types. Corresponding weighted sums can be obtained. As the electronic device accesses memory based on weighted sums obtained based on different data types, the bandwidth of the memory can be used more efficiently. As the bandwidth of memory is used more efficiently, the electronic device according to one embodiment can more quickly obtain output data from the neural network 300 based on the plurality of layers.

An electronic device according to an embodiment may store sets of parameters representing each of a plurality of neural networks with different precisions. For example, a neural network involving super resolution for upscaling images and/or video may require the precision of a data type to represent a floating point number based on 32 bits. . For example, a neural network related to super resolution for upscaling images and/or video may use a data type (e.g. in IEEE 754) to represent floating point numbers based on 16 bits. The precision of the half-precision floating point format defined by For example, a neural network for recognizing a subject included in an image and/or video may require the precision of a data type to represent an integer based on 8 bits and/or 4 bits. For example, a neural network for performing handwriting recognition may require precision of a data type to represent an integer based on the first bit and/or the second bit. An electronic device according to an embodiment may perform an operation to obtain a weighted sum based on different precisions corresponding to each of a plurality of neural networks.

In Figure 4 below, a quantization operation that changes the weight of a layer in a neural network is described.

Figure 4 is an example diagram for explaining quantization of weights according to an embodiment. Hereinafter, layers included in the neural network model (e.g., the input layer 310, hidden layers 320, and/or output layer 330 in FIG. 3) may correspond to at least one operator.

Referring to Figure 4, model 401 may include a first operator 405 with a first set of weights of a first data type. Model 403 can be created by quantizing model 401. For example, the quantization of model 403 may be obtained from model 401. The 'first' operator 407 can be created by quantizing the first operator 405. Output data may be generated by operations included in the

models

401 and 403 based on the input data. Model 403 may include a first' operator 407 with a second set of weights of a second data type. In one embodiment, an operator may be a unit of computation performed by a neural network and/or model. For example, a model may be formed by sequential concatenation of operators distinguished by different parameters (eg, weights and/or biases).

According to one embodiment, the model 401 may include at least one operator. The first operator 405 may be one of at least one operator included in the model 401. The first operator 405 may include a first set of weights and a bias of a first data type. For example, the first data type may represent a floating point number. As an example, the first data type may represent a floating point number using 32 bits. As an example, the first data type may represent a floating point number using 16 bits. For example, the second data type may represent an integer. For example, the second data type may represent an integer using 8 bits. For example, the second data type may represent an integer using 4 bits. However, the embodiments of the present disclosure are not limited thereto. An electronic device (e.g., the electronic device 101 of FIG. 1) has a higher precision (or may be referred to as a quantization level) and/or range of the second data type than the first data type. Low, if a model based on the second data type can be calculated among the first data type or the second data type, a quantization function described later can be executed. In the model 401, at least one processor (eg, processor 120 in FIG. 1) may obtain first sub-output data from input data through an operation of a first operator. The first sub-output data may be input as input data to a second operator connected to the first operator. The at least one processor 120 may obtain second sub-output data from the first sub-output data through an operation of a second operator. The second sub-output data may be input as input data to a third operator connected to the second operator. The at least one processor may obtain output data of the model 401 through an operation of an output operator on sub-output data. The output operator may generate output data of the neural network model.

According to one embodiment, when the first data type of the first weight set included in the first operator 405 is different from the second data type supported by the accelerator, the at least one processor 120 ) may change the first weight set of the first data type to the second weight set of the second data type. For example, the at least one processor 120 may change a first weight set, which is a floating point number expressed using 16 bits, to a second weight set, which is an integer expressed using 8 bits. For example, the at least one processor 120 may change a first weight set, which is a floating point number expressed using 32 bits, to a second weight set, which is an integer expressed using 4 bits. An accelerator may be a hardware component or a software application to support operations for an operator. The accelerator may support operations for operators to execute models. According to one embodiment, the model may include a machine learning model. Regarding the type of the accelerator, the information described below with respect to FIG. 2 may be referred to. The accelerator can process calculations related to neural networks. For example, the accelerator may perform matrix operations requiring floating point numbers and/or integers of a specific data type. The matrix operations may include multiplication operations of different matrices. To perform the matrix operation, the accelerator may include a plurality of FPUs and/or a plurality of arithmetic logic units (ALUs) for multiplication operations of floating point numbers and/or integers of a specific data type.

According to one embodiment, the data type of the weight set may vary depending on the operator. Therefore, whether to quantize an individual operator can be determined based on the data type of the individual operator.

According to one embodiment, the at least one processor 120 may obtain a set of sub-output data from a set of input data based on an operation of an individual operator. The at least one processor 120 may obtain a set of output data from a set of input data by performing an operation on at least one operator. The at least one processor 120 may store the set of input data, the set of sub-output data, and the set of output data as profile information.

According to one embodiment, in order to obtain the second set of weights of the second data type, the at least one processor 120 calculates the first set of weights based on the distribution of the set of sub-output data. The second weight set can be obtained by performing quantization.

For example, the at least one processor 120 may obtain a set of sub-output data by performing an operation on the first operator 405 on a set of input data that is actually used data. Quantization of the first weight of the first operator 405 may be generated based on the section of the sub-output data. As an example, the numeric interval that can be expressed by the first data type for expressing a floating point number may be the first interval. The numeric interval that can be expressed by the second data type for expressing an integer may be the second interval. As an example, the first section may range from negative infinity to positive infinity. As an example, the second interval may be from -128 to 127 in the case of a data type expressed by 8 bits. For example, in the case of a data type expressed in 16 bits, the second interval may be from -32768 to 32767. For example, when the set of sub-output data has a value between -120 and 110, the first weight set of the first operator 405 is expressed as an 8-bit integer instead of a data type for expressing a floating point number. Being described as a data type that can be used can be advantageous in terms of computational speed and use of accelerators. Therefore, the at least one processor 120 may change the data type of the first weight set based on the section of the sub-output data. The at least one processor 120 determines the number of second sections having a length smaller than the length of the first section from the first weight set of a first data type for expressing the number of the first sections through quantization. A second weight set of a second data type for expressing can be obtained. The at least one processor 120 may store the quantized second weight set in memory. When the neural network model is executed, the at least one processor 120 may obtain output data by performing an operation on the first operator 405 based on the second weight set.

According to one embodiment, the quantization may be performed for each operator. Even if the data type of the first operator's weight set is different from the data type supported by the accelerator and quantization is performed for the first operator, if the data type of the second operator's weight set is the same as the data type supported by the accelerator, the second operator Quantization may not proceed for the operator.

According to one embodiment, the at least one processor 120 may store the second weight set in the memory along with a hash value of the first weight set. The at least one processor 120 may identify a second weight set based on a hash value of the first weight set when the model 401 is executed after the model 401 is quantized. The at least one processor 120 may perform an operation for the first operator based on a second weight set corresponding to a hash value of the first weight set.

According to one embodiment, an accelerator may be a hardware component or a software application to support operations for an operator. The accelerator may support operations for operators to execute models. According to one embodiment, the model may include a machine learning model. Regarding the type of the accelerator, the information described below with respect to FIG. 2 may be referred to.

The accelerator can process calculations related to neural networks. For example, the accelerator may perform matrix operations requiring floating point numbers and/or integers of a specific data type. The matrix operations may include multiplication operations of different matrices. To perform the matrix operation, the accelerator may include a plurality of FPUs and/or a plurality of arithmetic logic units (ALUs) for multiplication operations of floating point numbers and/or integers of a specific data type. The FPU may be a module for efficiently processing floating point operations. The ALU may be a module for efficiently processing integer operations. The accelerator may not support operations on high-precision data depending on the model. For example, the accelerator may perform an operation based on a second data type for representing integers rather than a first data type for representing floating point numbers. For example, the accelerator may perform an operation based on a second data type for representing an integer through 8 bits rather than a first data type for representing an integer through 64 bits. The accelerator can mainly perform low-precision calculations for versatility. Therefore, the at least one processor 120 can increase accelerator utilization by changing the first weight set of the first data type to the second weight set of the second data type.

According to one embodiment, the model 403 may include at least one operator. The 'first' operator 407 may be one of at least one operator included in the model 403. The 'first' operator 407 may include a second set of weights and a bias based on a second data type. For example, the second data type may represent a floating point number. For example, the second data type may represent a floating point number using 16 bits. For example, the second data type may represent an integer. For example, the second data type may represent an integer using 8 bits. For example, the second data type may represent an integer using 4 bits. For example, the second data type may represent an integer using 2 bits.

Referring to FIG. 5 , an electronic device (eg, the electronic device 101 of FIG. 1 ) may include a memory 501 , a first processor 511 , and a second processor 521 . The memory 501 can store profile information 507. The profile information 507 may include input data 502, data for the model 503, and output data 505. The model 503 may include at least one operator including the kth operator 504. The kth operator 504 may include a first set of weights and a bias. The kth operator 504 can be quantized through the quantizer 514. The kth operator 504 can be changed to the k'th operator 506 through a quantization process. The first processor 511 may include a quantizer 514. The first processor 511 and/or the second processor 521 may include a floating point unit (FPU) and/or an accelerator. An accelerator may be a hardware component or a software application to support operations for an operator. The accelerator may support operations for operators to execute models. According to one embodiment, the model may include a machine learning model. Regarding the type of the accelerator, the information described below with respect to FIG. 2 may be referred to. The memory 501, the first processor 511, and the second processor 521 are electrically and/or operationally connected to each other by an electronic component such as a communication bus 531. Can be connected (electronically and/or operably coupled with each other). Hereinafter, hardware being operatively combined will mean that a direct connection or an indirect connection between the hardware is established, wired or wireless, such that the second hardware is controlled by the first hardware among the hardware. You can. According to one embodiment, the processor (e.g., the first processor 511 and/or the second processor 521) may perform operations on operators having data types supported by the FPU through the FPU. there is. However, embodiments of the present disclosure may not be limited thereto. According to one embodiment, the processor may perform operations on operators having data types supported by the accelerator through the accelerator. However, embodiments of the present disclosure may not be limited thereto. According to one embodiment, the quantizer may be a software application for quantizing a weight set and bias. The first processor 511 may quantize the first weight and bias of the first data type into the second weight and bias of the second data type through the quantizer 514.

According to one embodiment, the first processor 511 may be a central processing unit (CPU). The first processor 511 may generate the k'th operator 506 by quantizing the first weight and bias of the kth operator 504 through the quantizer 514. The FPU can perform an operation on the first weight of the first data type that requires high precision before quantization. The FPU may be a module for efficiently processing floating point operations. The quantizer 514 may include an application and/or an application programming interface (API) (eg, runtime library) provided for model calculation. The embodiment is not limited to this, and the quantizer 514 may include a combination of an accelerator and/or an application for controlling the accelerator. According to one embodiment, the quantizer 514 may be a software application for quantizing the weight set and bias calculated by the accelerator. The processor may quantize the first weight and bias of the first data type into the second weight and bias of the second data type through the quantizer 514. The accelerator may support operations for operators to execute models.

According to one embodiment, the first processor 511 may identify the second data type supported by the accelerator for computation for an operator. The second data type may correspond to low precision. For example, the accelerator may support only data types expressed as 8-bit integers and data types expressed as 16-bit integers. The first processor 511 may identify the first data type of the first weight set included in the first operator that is different from the second data type. For example, the first data type may be a data type expressed as a 32-bit floating point number. The first processor 511 is included in the kth operator 504, which is one of at least one operator included in the model 503, based on identifying the first data type that is different from the second data type. The first weight set of the first data type may be obtained. For example, the kth operator 504 of the model 503 has a first weight of a first data type expressed as a floating point number of 32 bits, and the accelerators have a first weight of a second data type expressed as an integer of 8 bits. can support. The first processor 511 may obtain the first weight set of the kth operator 504 based on identifying that the first data type and the second data type are different.

According to one embodiment, the first processor 511 may quantize the obtained first weight set. According to one embodiment, in order to obtain the second weight set of the second data type, the first processor 511 calculates the first weight set based on the distribution of the set of sub-output data. The second weight set can be obtained by performing quantization. For example, if the set of sub-output data has values between -120 and 110, the first weight set of the kth operator is the 8-bit set supported by the accelerator instead of the data type for representing floating point numbers. Being written as a data type expressed as an integer may be advantageous in terms of computational speed and use of accelerators included in electronic devices. Therefore, the first processor 511 can change the data type of the first weight set based on the section of the sub-output data. The first processor 511 calculates the number of second sections having a length smaller than the length of the first section from the first weight set of the first data type for expressing the number of the first sections through quantization. A second weight set of a second data type for expression may be obtained. The first processor 511 may store the quantized second weight set in the memory 501. When performing the neural network model 503, the first processor 511 may obtain output data by performing an operation on the kth operator 504 based on the second weight set.

According to one embodiment, the second data type of the second weight set of the k'th operator 506 obtained through the quantization may be operable by the accelerator. This is because the accelerator supports the second data type. The first processor 511 may perform an operator's calculation for the second weight set in at least one of the accelerators. When two or more neural network models are executed, the first processor 511 can designate an accelerator on which operations of operators included in each model will be performed. Therefore, the computation speed and efficiency of a quantized model may be higher than the computation speed and efficiency of a non-quantized model. Since the input data and output data input to the model by the user are used to evaluate the accuracy of the quantized model, the electronic device 101 can obtain a model personalized to the user based on quantization.

After obtaining the quantized operator through the operation described above in FIG. 5, the electronic device can measure the reliability of a model including the quantized operator. Based on the result of measuring the reliability, the electronic device may determine whether to replace the existing operator included in the model with the quantized operator. In Figure 6 below, the reliability measurement operation is explained.

Referring to FIG. 6, model 611 may include a neural network. The model 611 may include at least one operator including a first operator 613. The model 621 can be created by quantizing the model 611. For example, the quantization of model 621 may be obtained from model 611 by the operation described in FIG. 5 . The 'first' operator 623 may be created by quantizing the first operator 613. First output data 615 may be generated by an operation included in the model 611 based on the input data 601. Second output data 625 may be generated by an operation included in the model 621 based on the input data 601.

According to one embodiment, at least one processor (e.g., processor 120 in FIG. 1) generates a quantized model 621 based on the difference between the first output data 615 and the second output data 625. ) can be evaluated. Since the model 621 is created by quantizing the model 611, the reliability of the model 621 can be evaluated in order to replace the model 611 with the model 621. The at least one processor 120 may evaluate reliability by comparing first output data 615 of the model 611 and second output data 625 of the model 621. This may be because when the difference between output data for the same input data is less than the reference value, there is a low possibility that a problem will occur even if the model 621 replaces the model 611. The at least one processor 120 may identify, from the profile information generated by execution of the model 611, a set of first output data 615 obtained by performing an operation for the at least one operator. You can. Since the model 611 has been executed more times than necessary for quantization and/or reliability evaluation, the profile information may include more than a specified number of input data, more than a specified number of sub-output data, and more than a specified number of output data. Therefore, the profile information may include a set of input data, a set of sub-output data, and a set of output data for the model 611.

According to one embodiment, the at least one processor 120 may perform a reliability evaluation on the model 621 based on obtaining the second weight set. For example, the at least one processor 120 may perform a reliability evaluation on the model 621 based on identifying that the creation of the model 621 is complete. According to one embodiment, the at least one processor 120 may identify the set of the first output data 615 obtained by performing an operation for the at least one operator from the profile information. The at least one processor 120 obtains a second set of output data 625 from the set of input data 601 by performing an operation for the at least one operator based on the second weight set. can do. For example, the at least one processor 120 may obtain a set of second output data 625 through an operation on the input data 601 in the model 621. The at least one processor 120 generates a first output data of the model 611 based on a set of difference values between the first output data 615 and the corresponding second output data 625. The first weight set included in the operator 613 can be replaced with the second weight set. This is because when the sets of difference values between the first output data 615 and the second output data 625 are all less than the reference value, there is a low possibility of a problem occurring even if the model 621 replaces the model 611. . For example, if the sets of difference values are about 0.5%, about 0.7%, about 0.5%, about 0.2%, about 0.9%, and the reference value is about 1%, then the model 621 is ) can be replaced. However, when at least one difference value among the set of difference values between the first output data 615 and the second output data 625 is greater than or equal to the reference value, the model 621 may replace the model 611. I can't. For example, if the sets of difference values are about 0.5%, about 0.7%, about 0.5%, about 3.5%, about 0.9%, and the reference value is about 1%, then the model 621 is the model 611. cannot replace . When at least one difference value among the set of difference values between the first output data 615 and the second output data 625 is greater than or equal to a reference value, the at least one processor 120 performs quantization again, or Quantization can be performed through the server.

According to one embodiment, when at least one difference value among the set of difference values between the first output data 615 and the second output data 625 is greater than or equal to a reference value, the at least one processor 120, Quantization can be performed again. Since quantization takes time and resources, the time zone in which quantization of the model 611 is performed can be adjusted. For example, the at least one processor 120 may perform quantization of the model 611 during off-peak hours. As an example, quantization of the model 611 may be performed in the middle of the night. For example, based on the amount of mobile phone usage depending on the time of day, quantization may be performed during times when mobile phone usage is low. In the case of students, the times when mobile phone usage is low may be class times. For office workers, the times when mobile phone usage is low may be business hours. For example, the at least one processor 120 may display a notification on the display to guide the user to select a time zone at which quantization will be performed. The at least one processor 120 may perform quantization at a time designated by the user.

According to one embodiment, when at least one difference value among the set of difference values between the first output data 615 and the second output data 625 is greater than or equal to a reference value, the at least one processor 120, Quantization can be performed through the server. Quantization performed in the server is described below in FIG. 8.

In FIG. 6, all operators constituting the model 621 are shown as if quantized, but embodiments of the present disclosure are not limited thereto. According to one embodiment, the second operator includes weight sets of the second data type supported by the accelerator, so quantization may not be performed. A second operator on which quantization is not performed may have the same weight set and bias in the model 621 and the model 611.

In addition, in FIG. 6, the model 621 is shown as being stored in memory separately from the model 611, but the embodiment of the present disclosure is not limited thereto. According to one embodiment, the at least one processor 120 may not store the model 621 itself, but may store the second weight set and bias of the quantized operator in memory. The at least one processor 120 may store the quantized second weight set in the memory together with a hash value of the first weight set. When the model 611 is used, the at least one processor 120 may identify a second weight set based on a hash value of the first weight set. The at least one processor 120 may perform an operation for the first operator 613 based on a second weight set corresponding to a hash value of the first weight set.

Referring to Figure 7, in operation 701, according to one embodiment, at least one processor (eg, processor 120 of Figure 1) may obtain a first set of weights. A model including a neural network may include a first operator (eg, the first operator 613 in FIG. 6). The first weight set may be weights included in the first operator.

At operation 702, according to one embodiment, the at least one processor 120 identifies a second data type of the accelerator that is different from the first data type of the first operator 613, the absence of quantization information, and / Or it can be identified whether to identify the absence of a quantization model. When identifying the second data type of the accelerator that is different from the first data type of the first operator 613, the at least one processor 120 may perform operation 704. If quantization information is not identified, the at least one processor 120 may perform operation 704. When identifying the absence of a quantization model, the at least one processor 120 may perform operation 704. When the at least one processor 120 identifies the second data type of the accelerator that is the same as the first data type of the first operator 613, or when quantization information is identified and a quantization model is identified, the at least One processor 120 may perform operation 703. When identifying the second data type of the accelerator as the same as the first data type of the first operator 613, the at least one processor 120 may perform operation 703. When quantization information is identified, the at least one processor 120 may perform operation 703. When a quantization model is identified, the at least one processor 120 may perform operation 703. According to one embodiment, the at least one processor 120 may identify a second data type of the accelerator that is different from the first data type of the first operator (eg, the first operator 613 in FIG. 6). For example, the first data type may represent a 32-bit floating point number. The accelerator can only support a data type expressed as an 8-bit integer and a second data type expressed as a 16-bit integer. For example, the first data type may represent a floating point number expressed using 16 bits. The second data type can represent an integer using 8 bits. However, the embodiments of the present disclosure are not limited thereto. According to one embodiment, the precision and/or range of the second data type is lower than that of the first data type, and the electronic device 101 uses a model based on the second data type. If the calculation of is possible, the quantization function can be performed.

According to one embodiment, the at least one processor 120 may identify the absence of quantization information and/or a quantization model. According to one embodiment, the at least one processor 120 may identify the absence of a quantization model. For example, the quantization model may be a quantized model. As an example, the quantized model may be a 'first' operator corresponding to the hash value of the first operator. If there is no other operator corresponding to the hash value of the first operator, the at least one processor 120 may identify the absence of a quantized model. According to one embodiment, the at least one processor 120 may identify the absence of quantization information. For example, the quantization information may be a value included in the quantized model. As an example, the quantization information may be a second weight set and/or bias within the first' operator. In the absence of the second set of weights and/or bias in the first' operator, the at least one processor 120 may identify the absence of quantized information.

In operation 703, according to one embodiment, the at least one processor 120 may perform a first operator operation on the second quantized weight set. The quantized second weight set may be stored in the memory along with a hash value of the first weight set. When the neural network model is used, the at least one processor 120 may identify a second weight set based on a hash value of the first weight set. The at least one processor 120 may perform an operation on the first operator 613 based on a second weight set corresponding to a hash value of the first weight set.

In operation 704, according to one embodiment, the at least one processor 120 may perform a first operator operation with a first set of weights through a quantizer. This is because sub-output data is required for quantization. The first operator operation using the first weight set may take more time than the first operator operation using the quantized second weight set.

According to one embodiment, in operation 705, the at least one processor 120 may collect and store profile information through the quantizer. The profile information may include a set of input data, a set of sub-output data, and a set of output data for the neural network model. The set of input data may be obtained by executing the neural network model.

In operation 706, according to one embodiment, the at least one processor 120 may generate and store second weight sets through the quantizer. According to one embodiment, the at least one processor 120 may obtain the second weight set by performing quantization on the first weight set based on the distribution of the set of sub-output data. For example, the at least one processor 120 may obtain a set of sub-output data by performing an operation for the first operator on a set of input data that is actually used data. Quantization for the first weight of the first operator may be generated based on the section of the sub-output data. As an example, the numeric interval that can be expressed by the first data type for expressing a floating point number may be the first interval. The numeric interval that can be expressed by the second data type for expressing an integer may be the second interval. The first section may range from negative infinity to positive infinity. The second interval may be from -128 to 127 in the case of a data type expressed by 8 bits. In the case of a data type expressed in 16 bits, the second interval may be from -32768 to 32767. When the set of sub-output data has a value between -120 and 110, the first weight set of the first operator is described as a data type expressed as an 8-bit integer instead of a data type for expressing a floating point number. This can be advantageous in terms of computational speed and use of accelerators. Therefore, the at least one processor 120 may change the data type of the first weight set based on the section of the sub-output data. The at least one processor 120 may store the quantized second weight set in memory. When the neural network model is executed, the at least one processor 120 may obtain output data by performing an operation for the first operator based on the second weight set.

Figure 8 shows a flow of operations for executing quantization using a server according to an embodiment. Based on the execution of an application including a neural network model, an electronic device (eg, the electronic device 101 of FIG. 1) may perform the following operations.

Referring to FIG. 8, in operation 801, according to one embodiment, at least one processor (e.g., processor 120 of FIG. 1) of the electronic device 101 performs quantization by a first set of weights through a quantizer. The first operator can operate. The operation 801 may be performed similarly to the operation 704 of FIG. 7. The at least one processor 120 performs operation 801 when identifying a second data type of the accelerator that is different from the first data type of the first operator or identifying the absence of quantization information and/or the absence of a quantization model. can do. The at least one processor 120 identifies a second data type of the accelerator that is the same as the first data type of the first operator, and when identifying quantization information and a quantization model, the at least one processor 120 determines the first data type for the second quantized weight set. Operations can be performed on operators. Sub-output data may be required for quantization.

In operation 802, the at least one processor 120 may collect and store profile information through the quantizer. The profile information may include a set of input data, a set of sub-output data, and a set of output data. According to one embodiment, the at least one processor 120 may obtain the second weight set by performing quantization on the first weight set based on the distribution of the set of sub-output data.

At operation 803, according to one embodiment, the at least one processor 120 may identify a reliability of the model that is outside a reference range. To perform operation 803, the at least one processor 120 may evaluate the reliability of the quantized model. The reliability of the quantized model may be identified based on whether at least one difference value among a set of difference values between first output data and second output data is greater than or equal to a reference value. First output data may be obtained by performing an operation for at least one operator based on the first weight set. The second output data may be obtained by performing an operation for at least one operator based on the second weight set. According to one embodiment, when at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value, the at least one processor 120 operates on a server (e.g., FIG. 1 Quantization can be performed through the server 108).

In

operations

804 and 805, according to one embodiment, the at least one processor 120 may transmit sub-output data information. The at least one processor 120 may transmit the sub-output data to the server 108. The at least one processor 120 selects the profile information based on identifying that at least one difference value of the set of difference values between the first output data and the corresponding second output data is greater than or equal to a reference value. The acquired set of sub-output data may be transmitted to the server 108 through the communication circuit. The sub-output data information may be generated through the first operator operation including the first weight set.

At operation 806, according to one embodiment, the server 108 may receive sub-output data information. The sub-output data information may be received from a plurality of electronic devices. The at least one processor 120 may perform quantization through sub-output data information transmitted from a plurality of electronic devices of the same model. Since input data, which is actual data, is not transmitted to the server 108, it may be advantageous in terms of privacy protection compared to the case where input data is transmitted to the server 108. For example, since the input data and output data input to the model by the user are used to evaluate the accuracy of the quantized model, the electronic device 101 may obtain a model personalized to the user based on quantization. You can.

In operation 807, according to one embodiment, the server 108 may aggregate sub-output data information. When a designated amount of sub-output data information is aggregated, the at least one processor 120 may stop receiving sub-output data information.

At operation 808, according to one embodiment, the server 108 may generate third weight sets. According to one embodiment, the server 108 may obtain the third weight set by performing quantization on the first weight set based on the distribution of the set of sub-output data. For example, quantization for the first weight of the first operator may be generated based on the section of the sub-output data. The server 108 expresses the number of third sections having a length smaller than the length of the first section from the first weight set of the first data type for expressing the number of the first section through quantization. A third weight set of the second data type may be obtained. For information about quantization, the information described in operation 706 of FIG. 7 may be referred to.

At

operations

809 and 810, according to one embodiment, the server 108 may transmit third weight sets. Based on the sub-output data, the server 108 transmits third weight sets of the second data type on which quantization of the first weight set has been performed to the electronic device 101 through the communication circuit. You can. The server 108 may transmit the third weight sets to a plurality of electronic devices that have requested quantization according to execution of the application.

In operation 811, according to one embodiment, the at least one processor 120 may store the received third weight sets. The at least one processor 120 may store the received third weight set in the memory. The at least one processor 120 may obtain output data by performing an operation for the first operator based on the third weight set in a neural network.

Referring to FIG. 9, in operation 901, according to one embodiment, the at least one processor 120 may obtain a first set of weights. According to one embodiment, the at least one processor 120 performs operation 901 of FIG. 9 to determine whether the data type of the first weight set is different from the data type supported by the accelerator and whether quantization information is absent. , and/or based on the absence of a quantization model. According to one embodiment, the at least one processor 120 may identify a second data type of the accelerator that is different from the first data type of the operator (eg, the first operator 405 in FIG. 4). The at least one processor 120 may identify the absence of quantization information and/or the absence of a quantization model. When identifying a data type supported by the accelerator that is different from the data type of the first weight set, without the quantization information, and/or without the quantization model, the at least one processor 120 determines the first weight set. A first set of weights may be obtained to quantize the set of weights. The precision required for the first data type may be higher than the precision required for the second data type. For example, the first data type may represent a floating point number using 16 bits. The second data type can represent an integer using 8 bits. According to one embodiment, the data type of the weight set may be different depending on the individual operator. Therefore, whether to quantize an individual operator can be determined based on the data type of the individual operator.

In operation 902, according to one embodiment, the at least one processor 120 may obtain a set of sub-output data from profile information. This is because quantization is performed based on the set of sub-output data. The profile information may include the set of input data, the set of sub-output data, and the set of output data. According to one embodiment, the at least one processor 120 may obtain a set of sub-output data from a set of input data based on the operation of the first operator. The at least one processor 120 may obtain a set of output data from a set of input data by performing an operation on at least one operator.

In operation 903, according to one embodiment, the at least one processor 120 may obtain a second set of weights based on a set of sub-output data. According to one embodiment, in order to obtain the second set of weights of the second data type, the at least one processor 120 calculates the first set of weights based on the distribution of the set of sub-output data. The second weight set can be obtained by performing quantization. Quantization for the first weight of the first operator may be generated based on the section of the sub-output data. As an example, the numeric interval that can be expressed by the first data type for expressing a floating point number may be the first interval. The numeric interval that can be expressed by the second data type for expressing an integer may be the second interval. The at least one processor 120 determines the number of second sections having a length smaller than the length of the first section from the first weight set of a first data type for expressing the number of the first sections through quantization. A second weight set of a second data type for expressing can be obtained.

At operation 904, according to one embodiment, the at least one processor 120 may store a second set of weights based on obtaining the second set of weights. The at least one processor 120 may perform calculations based on the quantized second weight set when performing a neural network model. When executing a neural network model, the at least one processor 120 may obtain output data by performing an operation for the first operator based on the second weight set. According to one embodiment, the at least one processor 120 may store the second weight set in the memory along with a hash value of the first weight set. When a neural network model is used, the at least one processor 120 may identify a second weight set based on a hash value of the first weight set. The at least one processor 120 may perform an operation for the first operator based on a second weight set corresponding to a hash value of the first weight set.

FIG. 10 illustrates a flow of operations of an electronic device for storing a second set of weights based on reliability evaluation according to an embodiment. The flow of operation of FIG. 10 may embody operation 904 of FIG. 9 .

Referring to FIG. 10, in operation 1001, according to one embodiment, the at least one processor 120 may obtain a set of first output data from profile information. According to one embodiment, the at least one processor 120 may perform a reliability evaluation on a quantized neural network model based on obtaining the second weight set. For example, based on identifying that the generation of the quantized neural network model is complete, a reliability evaluation of the quantized neural network model may be performed. According to one embodiment, the at least one processor 120 may identify a set of first output data obtained by performing an operation for the at least one operator from the profile information.

In operation 1002, according to one embodiment, the at least one processor 120 may obtain a second output data set based on a second set of weights. The at least one processor 120 may obtain a second set of output data from the set of input data by performing an operation for the at least one operator based on the second weight set. For example, the at least one processor 120 may obtain a set of second output data through an operation on the input data in the quantized neural network model.

At operation 1003, according to one embodiment, the at least one processor 120 replaces the first set of weights with a second set of weights based on the difference values between the first output data and the second output data. You can. The at least one processor 120 is configured to operate the first operator included in the neural network model based on a set of difference values between the first output data and the corresponding second output data. The first weight set can be replaced with the second weight set. When the sets of difference values between the first output data and the second output data are all less than the reference value, the possibility of a problem occurring may be low even if the quantized neural network model replaces the neural network model before quantization. For example, if the sets of difference values are about 0.5%, about 0.7%, about 0.5%, about 0.2%, about 0.9%, and the reference value is about 1%, then the quantized neural network model is It can replace the neural network model. However, when at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value, the quantized neural network model cannot replace the neural network model before quantization. For example, if the sets of difference values are about 0.5%, about 0.7%, about 0.5%, about 3.5%, about 0.9%, and the reference value is about 1%, then the quantized neural network model is There is no replacement for the network model. When at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value, the at least one processor 120 performs quantization again or performs quantization through a server. can do.

Referring to FIG. 11, in operation 1101, according to one embodiment, the at least one processor 120 may obtain a first set of weights. According to one embodiment, the at least one processor 120 performs operation 1101 of FIG. 11 to determine whether the data type of the first weight set is different from the data type supported by the accelerator and whether quantization information is absent. , and/or based on the absence of a quantization model. According to one embodiment, the at least one processor 120 may identify a second data type of the accelerator that is different from the first data type of the operator. The at least one processor 120 may identify the absence of quantization information and/or the absence of a quantization model. When the at least one processor 120 identifies a data type supported by the accelerator that is different from the data type of the first weight set, the quantization information is absent, and/or the quantization model is not present, the at least one The processor 120 may obtain a first weight set from profile information in order to quantize the first weight set.

In operation 1102, according to one embodiment, the at least one processor 120 may obtain a set of sub-output data from profile information. This is because quantization is performed based on the set of sub-output data. According to one embodiment, the at least one processor 120 may obtain a set of sub-output data from a set of input data based on the operation of the first operator. Operation 1102 may be performed similarly to operation 902 of FIG. 9 . Hereinafter, duplicate descriptions will be omitted.

In operation 1103, according to one embodiment, the at least one processor 120 may obtain a second set of weights based on a set of sub-output data. According to one embodiment, in order to obtain the second set of weights of the second data type, the at least one processor 120 calculates the first set of weights based on the distribution of the set of sub-output data. The second weight set can be obtained by performing quantization. Quantization of the first weight of the first operator may be performed based on the section of the sub-output data. Operation 1103 may be performed similarly to operation 903 of FIG. 9 . Hereinafter, duplicate descriptions will be omitted.

At operation 1104, according to one embodiment, the at least one processor 120 may identify whether the reliability is within a reference range. When the reliability is within the reference range, the at least one processor 120 may perform operation 1105. When the reliability is outside the reference range, the at least one processor 120 may perform operation 1106. According to one embodiment, the at least one processor 120 obtains the first information by performing an operation on the at least one operator from profile information generated by executing (or simulating) a neural network model before quantization. A set of output data can be identified. According to one embodiment, the at least one processor 120 may perform a reliability evaluation on the quantized neural network model based on obtaining the second weight set. For example, the at least one processor 120 may perform a reliability evaluation based on identifying that quantization of the neural network model before quantization is complete. The at least one processor 120 may obtain a second set of output data from the set of input data by performing an operation for the at least one operator based on the second weight set. The at least one processor 120 may obtain reliability greater than or equal to a specified criterion for a quantized neural network based on a set of difference values between the first output data and the corresponding second output data. there is. This is because when the sets of difference values between the first output data and the second output data are all less than the reference value, the possibility of a problem occurring is low. The at least one processor 120 obtains reliability greater than a specified standard for the quantized neural network when at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value. Can not.

At operation 1105, according to one embodiment, the at least one processor 120 may store a second set of weights. The at least one processor 120, when the sets of difference values between the first output data and the corresponding second output data are all less than a reference value, sets the first weight set to the second weight value. It can be replaced as a set. This is because it is advantageous in terms of time resources to perform calculations for the operator based on the second weight set.

At operation 1106, according to one embodiment, the at least one processor 120 may transmit a set of sub-output data to a server. When at least one difference value among the set of difference values between the first output data and the second output data is greater than or equal to a reference value, the first weight set cannot replace the second weight set. The at least one processor 120 may perform quantization again or may perform quantization through a server.

In operation 1107, according to one embodiment, the at least one processor 120 may receive a third quantized weight set from a server. Based on the sub-output data, the server may transmit the third weight sets on which quantization of the first weight set was performed to the electronic device 101 through the communication circuit. The server may transmit the third weight sets to a plurality of electronic devices that have requested quantization according to execution of the application. The at least one processor 120 may store the received third weight set in the memory. When the neural network model is executed, the at least one processor 120 may obtain output data by performing an operation for the first operator based on the third weight set. When executing an operation for the first operator based on the third weight set of the second data type, the operation may be executed based on an accelerator optimized for operations on machine learning, such as an NPU. Calculation speed can be improved and current consumption can be reduced.

Hereinafter, the present disclosure relates to an electronic device and method for quantizing a weight set included in an operator based on the range of sub-output data. The electronic device can improve calculation speed and reduce current consumption by performing quantization for each operator based on the range of sub-output data. When quantization of a model is completed, reliability evaluation is performed and quantization is performed on the server for models with reliability less than a reference value, thereby reducing resources for performing quantization of various electronic devices.

The effects that can be obtained from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

As described above, the electronic device 101 may include a memory 130; 501 and at least one processor 120; 200; 511. The at least one processor (120;200;511), in the memory (130;501), operates the first operator (613) included in the first operator (613), which is one of the at least one operator included in the model (503;611). A first set of weights of the data type may be obtained. The at least one processor (120; 200; 511) calculates the operation ( A set of sub-output data corresponding to the set of input data 502 and 601 used for computation can be obtained. The at least one processor (120; 200; 511) performs quantization on the first set of weights based on the set of sub-output data, thereby generating a second weight of a second data type supported by at least one accelerator. You can obtain a set. The at least one processor (120;200;511) may store the second weight set in the memory (130;501) based on obtaining the second weight set.

According to one embodiment, in order to obtain a weight set of the second data type, the at least one processor (120; 200; 511) uses a weight set of the first data type for representing a floating point number of the first interval. From the first weight set, through the quantization, the second weight set of the second data type for expressing the number of second sections having a length smaller than the length of the first section can be obtained.

According to one embodiment, the accelerator may include a circuit for performing an operation based on the second data type, either the first data type or the second data type.

According to one embodiment, the accelerator may be configured to perform an operation based on the second data type, either the first data type or the second data type.

According to one embodiment, the at least one processor (120; 200; 511) responds to a request to execute a function associated with the model (503; 611), either the first weight set or the second weight set. , By inputting the second weight set to the accelerator, the calculation of the first operator 613 can be additionally performed.

According to one embodiment, the at least one processor (120; 200; 511), based on obtaining the second set of weights, performs an operation for the at least one operator from the profile information. 1 A set of output data 615 can be identified. The at least one processor (120; 200; 511) performs an operation for the at least one operator based on the second set of weights, thereby generating second output data from the set of input data (502; 601). 625) set can be obtained. The at least one processor (120; 200; 511) operates on the model (503) based on a set of difference values between the first output data (615) and the corresponding second output data (625). ;611), the first weight set included in the first operator 613 can be replaced with the second weight set.

According to one embodiment, the electronic device may additionally include a communication circuit. The at least one processor (120; 200; 511) determines that at least one difference value among the set of difference values between the first output data 615 and the corresponding second output data 625 is greater than or equal to a reference value. Based on the identification, the set of sub-output data obtained from the profile information can be additionally transmitted to the server through a communication circuit. The at least one processor (120; 200; 511), based on the sub-output data, sends a third weight set of the second data type on which quantization of the first weight set was performed to a server through the communication circuit. You can receive additional information from. The at least one processor (120;200;511) may additionally store the received third weight set in the memory (130;501).

According to one embodiment, in order to obtain the second weight set of the second data type, the at least one processor (120; 200; 511), based on the distribution of the set of sub-output data, 1 The second weight set can be obtained by performing quantization on the weight set.

According to one embodiment, the at least one processor (120;200;511) may additionally store the second weight set in the memory (130;501) along with the hash value of the first weight set.

According to one embodiment, the at least one processor (120; 200; 511) additionally performs an operation on the first operators 613 based on a second weight set corresponding to a hash value of the first weight set. It can be done.

According to one embodiment, the set of sub-output data of the first operator 613 is the input data 502; 601 of a second operator connected to the first operator 613, which is one of the at least one operator. It may be a set of

According to one embodiment, in the memory (130; 501), to obtain the first set of weights, the at least one processor (120; 200; 511) is connected to the accelerator for computation for an operator. The second data type supported by can be identified. In the memory (130;501), to obtain the first set of weights, the at least one processor (120;200;511) includes a first operator (613) different from the second data type. A first data type of the first weight set may be identified. In the memory (130;501), to obtain the first set of weights, the at least one processor (120;200;511), based on identifying the first data type that is different from the second data type, , in the memory 130; 501, to obtain the first weight set of the first data type included in the first operator 613, which is one of at least one operator included in the model 503; 611. You can.

According to one embodiment, in the memory (130; 501), to obtain the first set of weights, the at least one processor (120; 200; 511) is connected to the accelerator for computation for an operator. The second data type supported by can be identified. In the memory (130;501), to obtain the first set of weights, the at least one processor (120;200;511) includes a first operator (613) different from the second data type. Identifying a first data type of a first weight set, or identifying a data type of a second weight set corresponding to a hash value of the first weight set with the first data type, or identifying a data type of a second weight set corresponding to the hash value of the first weight set. Based on identifying that the corresponding weight set is only the first weight set, in the memory 130; 501, the first operator 613, which is one of at least one operator included in the model 503; 611, The first weight set of the included first data type may be obtained.

According to one embodiment, in the memory (130;501), to obtain the first set of weights, the at least one processor (120;200;511) selects the second data type supported by the accelerator. can be identified. In the memory (130;501), to obtain the first set of weights, the at least one processor (120;200;511) includes a first operator (613) different from the second data type. Identifying a first data type of a first weight set and whether a weight set of the second data type obtained from the first weight set based on quantization is stored in the memory or whether weights are included in the weight set Based on this, it can be determined whether to perform quantization on the first weight set of the first data type.

As described above, according to one embodiment, the method performed by the electronic device 101 is performed in the memory 130; 501 by one of at least one operator included in the model 503; 611. It may include an operation of acquiring a first weight set of the first data type included in the first operator 613. The method uses input data 502 used for computation of the first operator 613 from profile information generated by execution of the model 503; 611, stored in the memory 130; 501. ;601) may include an operation of acquiring a set of sub-output data corresponding to the set. The method may include, based on the set of sub-output data, obtaining a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set. . The method may include storing the second set of weights in the memory (130; 501) based on obtaining the second set of weights.

According to one embodiment, the operation of acquiring the weight set of the second data type includes, through the quantization, the first weight set of the first data type for representing a floating point number of the first interval. It may include obtaining the second weight set of the second data type to represent the number of second sections having a length smaller than the length of the first section.

According to one embodiment, the method is configured to, in response to a request to execute a function associated with the model (503; 611), apply the second set of weights, either the first set of weights or the second set of weights, to the accelerator. By inputting, the operation of performing the operation of the first operator 613 may be additionally included.

According to one embodiment, the method identifies a set of first output data 615 obtained by performing an operation for the at least one operator from the profile information, based on obtaining the second set of weights. Can include actions. The method may include obtaining a second set of output data (625) from the set of input data (502; 601) by performing an operation on the at least one operator based on the second set of weights. You can. The method is based on a set of difference values between the first output data 615 and the corresponding second output data 625, and the first operator 613 included in the model 503; 611. ) may include replacing the first weight set included in the second weight set with the second weight set.

According to one embodiment, the method is based on identifying that at least one difference value of the set of difference values between the first output data 615 and the corresponding second output data 625 is greater than or equal to a reference value. , It may additionally include an operation of transmitting the set of sub-output data obtained from the profile information to a server through a communication circuit. The method may additionally include receiving, based on the sub-output data, a third weight set of the second data type on which quantization of the first weight set has been performed from a server through the communication circuit. The method may additionally include storing the received third weight set in the memory (130;501).

According to one embodiment, the operation of obtaining the second weight set of the second data type is performed by performing quantization on the first weight set based on the distribution of the set of sub-output data. It may include an operation to obtain.

According to one embodiment, the operation of storing the second weight set together with the hash value of the first weight set in the memory (130; 501) may be additionally included.

According to one embodiment, the method may additionally include performing an operation on the first operators 613 based on a second weight set corresponding to a hash value of the first weight set.

According to one embodiment, in the memory 130 (501), obtaining the first set of weights includes identifying the second data type supported by the accelerator for computation for an operator. may include. In the memory 130; 501, the operation of acquiring the first weight set includes identifying a first data type of the first weight set included in the first operator 613 that is different from the second data type. may include. In the memory 130;501, the operation of obtaining the first set of weights generates, in the memory 130;501, a model 503 based on identifying the first data type that is different from the second data type. ; 611) may include an operation of acquiring the first weight set of the first data type included in the first operator 613, which is one of the at least one operator included in 611).

According to one embodiment, in the memory 130 (501), obtaining the first set of weights includes identifying the second data type supported by the accelerator for computation for an operator. may include. In the memory 130; 501, the operation of acquiring the first weight set includes identifying a first data type of the first weight set included in the first operator 613 that is different from the second data type. , identifying the data type of the second weight set corresponding to the hash value of the first weight set as the first data type, or the weight set corresponding to the hash value of the first weight set being the first data type. Based on the operation of identifying that there is only a set of weights, in the memory 130; 501, the first data included in the first operator 613, which is one of at least one operator included in the model 503; 611. and obtaining the first weight set of type.

According to one embodiment, in the memory 130 (501), obtaining the first set of weights may include identifying the second data type supported by the accelerator. In the memory 130; 501, the operation of acquiring the first weight set includes identifying a first data type of the first weight set included in the first operator 613 that is different from the second data type. may include. In the memory 130; 501, the operation of acquiring the first weight set may be performed by determining whether the weight set of the second data type obtained from the first weight set based on quantization is stored in the memory or the weight set. The method may include determining whether to perform quantization on the first weight set of the first data type based on whether a weight is included in the set.

According to one embodiment, the operation of acquiring the second weight set of the second data type is by performing quantization on the first weight set based on the section in which the values of the set of sub-output data are distributed. It may include an operation of obtaining the second weight set.

According to one embodiment, the operation of acquiring the second weight set of the second data type includes the electronic device using the size of the section in which the values of the sub-output data set are distributed to determine the second data. It may include determining a quantization level for a type and performing quantization on the first weight set according to the quantization level.

According to one embodiment, the range of numbers that can be expressed according to the second data type and the quantization level may include a section of the sub-output data set.

As described above. According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs are stored in the memory (130; 501) when executed by a processor of an electronic device. , may include instructions that cause the electronic device to obtain a first weight set of a first data type included in a first operator 613, which is one of at least one operator included in the model 503; 611. . The one or more programs are input used for computation of the first operator 613 from profile information generated by execution of the model 503; 611, stored in the memory 130; 501. and instructions causing the electronic device to obtain a set of sub-output data corresponding to the set of data 502;601. The one or more programs, based on the set of sub-output data, perform quantization on the first set of weights to obtain a second set of weights of a second data type supported by at least one accelerator. It may contain instructions that cause . The one or more programs may include instructions that cause the electronic device to store the second set of weights in the memory (130; 501) based on obtaining the second set of weights.

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs, when executed by a processor of an electronic device, include a weight set of the second data type. In order to obtain, the at least one processor (120; 200; 511), through the quantization, from the first weight set of the first data type for representing the floating point number of the first interval, and instructions that cause the electronic device to obtain the second weight set of the second data type to represent the number of second intervals having a length less than the length of one interval.

According to one embodiment, in a computer readable storage medium storing one or more programs, when the one or more programs are executed by a processor of an electronic device, the accelerator stores the first data. type or the second data type, may include instructions that cause the electronic device to perform an operation based on the second data type.

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs, when executed by a processor of an electronic device, display the model (503; 611) and In response to a request to execute a related function, perform an operation of the first operator 613 by inputting the second set of weights, out of the first set of weights or the second set of weights, to the accelerator. It may contain instructions that trigger an electronic device.

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs, when executed by a processor of an electronic device, obtain the second set of weights. Based on this, identify a set of first output data 615 obtained by performing an operation for the at least one operator from the profile information, and perform an operation for the at least one operator based on the second set of weights. Obtaining a set of second output data (625) from the set of input data (502; 601), and determining the difference between the first output data (615) and the corresponding second output data (625) Based on a set of values (differences), cause the electronic device to replace the first set of weights included in the first operator 613 included in the model 503; 611 with the second set of weights. It can include instructions that do.

According to one embodiment, in a computer readable storage medium storing one or more programs, when the one or more programs are executed by a processor of an electronic device, the first output data 615 and the set of sub-output data obtained from the profile information based on identifying that at least one difference value of the set of difference values between the corresponding second output data 625 is greater than or equal to a reference value. Transmits to a server through and, based on the sub-output data, receives a third weight set of the second data type on which quantization of the first weight set has been performed from the server through the communication circuit, and receives the received and instructions that cause the electronic device to store the third set of weights in the memory (130;501).

According to one embodiment, in a computer readable storage medium storing one or more programs, when the one or more programs are executed by a processor of an electronic device, the first data type of the second data type instructions that cause the electronic device to obtain the second set of weights by performing quantization on the first set of weights, based on a distribution of the set of sub-output data, to obtain a set of two weights. You can.

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs, when executed by a processor of an electronic device, include a hash value of the first weight set. together with instructions that cause the electronic device to store the second set of weights in the memory (130;501).

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs, when executed by a processor of an electronic device, include a hash value of the first weight set. may include instructions that cause the electronic device to perform an operation for first operators 613 based on a second set of weights corresponding to .

According to one embodiment, a computer readable storage medium storing one or more programs, wherein the set of sub-output data of the first operator (613) is one of the at least one operator, It may be a set of input data (502; 601) of a second operator connected to the first operator (613).

According to one embodiment, in a computer readable storage medium storing one or more programs, the one or more programs are stored in the memory (130; 501) when executed by a processor of an electronic device. , to obtain the first set of weights, identify the second data type supported by the accelerator, and determine the first weight set included in the first operator 613 that is different from the second data type. Whether a data type has been identified, whether a weight set of the second data type obtained from the first weight set based on quantization is stored in the memory, or whether weights are included in the weight set of the second data type and instructions that cause the electronic device to determine whether to perform quantization on the first weight set of the first data type.

As described above, according to one embodiment, the electronic device 101 may include a memory 130; 501 that stores instructions, and at least one processor 120; 200; 511. When the instructions are executed by the at least one processor 120; 200; 511, the electronic device 101, in the memory 130; 501, executes at least one processor included in the model 503; 611. A profile generated by execution of the model (503; 611), which obtains a first weight set of a first data type included in a first operator (613), one of the operators, and stored in the memory (130; 501). From the information, obtain a set of sub-output data corresponding to the set of input data (502; 601) used for computation of the first operator (613), and based on the set of sub-output data, Obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set, and based on obtaining the second weight set, the memory 130; 501 ) to store the second set of weights.

According to one embodiment, the instructions, when executed by the at least one processor 120; 200; 511, cause the electronic device 101 to use a first data type to obtain a weight set of the second data type. From the first weight set of the first data type for expressing the floating point number of the interval, through the quantization, the second weight set for expressing the number of the second interval having a length smaller than the length of the first interval Obtaining the second weight set of data types.

According to one embodiment, the instructions, when executed by the at least one processor (120; 200; 511), request the electronic device 101 to execute a function related to the model (503; 611). In response, among the first weight set or the second weight set, the second weight set may be input to the accelerator, thereby causing the first operator 613 to perform an operation.

According to one embodiment, when the instructions are executed by the at least one processor 120; 200; 511, the electronic device 101 generates the profile information based on obtaining the second weight set. Identifying a set of first output data 615 obtained by performing an operation on the at least one operator from, and performing an operation on the at least one operator based on the second set of weights, the input data Obtain a set of second output data 625 from the set of (502; 601), and a set of difference values between the first output data 615 and the corresponding second output data 625. Based on this, the first weight set included in the first operator 613 included in the model 503 (611) may be caused to be replaced with the second weight set.

According to one embodiment, the electronic device 101 may include a communication circuit. The instructions, when executed by the at least one processor 120; 200; 511, cause the electronic device 101 to output information between the first output data 615 and the corresponding second output data 625. Based on identifying that at least one difference value among the set of difference values is greater than or equal to a reference value, transmitting the set of sub-output data obtained from the profile information to a server through the communication circuit, and based on the sub-output data Thus, a third weight set of the second data type on which quantization of the first weight set has been performed is received from the server through the communication circuit, and the received third weight set is stored in the memory 130; 501. It can cause it to be saved in .

According to one embodiment, the instructions, when executed by the at least one processor 120; 200; 511, cause the electronic device 101 to obtain the second weight set of the second data type. , based on the distribution of the set of sub-output data, may cause the second weight set to be obtained by performing quantization on the first weight set.

According to one embodiment, the instructions, when executed by the at least one processor (120; 200; 511), cause the electronic device 101 to store the second weight together with the hash value of the first weight set. This may cause the set to be stored in the memory (130; 501).

According to one embodiment, when the instructions are executed by the at least one processor 120; 200; 511, the electronic device 101 generates a second weight set corresponding to a hash value of the first weight set. It can cause the first operators 613 to perform operations based on .

According to one embodiment, the instructions, when executed by the at least one processor 120; 200; 511, are supported by the electronic device 101 by the accelerator to obtain the first set of weights. Identifying the second data type, and identifying the first data type of the first weight set included in the first operator 613 that is different from the second data type, the first weight set based on quantization Based on whether a weight set of the second data type obtained from is stored in the memory (130; 501), or whether a weight is included in the weight set of the second data type, the first data type may result in determining whether to perform quantization on the first set of weights.

According to one embodiment, the instructions, when executed by the at least one processor, cause the electronic device to set values of the set of sub-output data to obtain the second weight set of the second data type. Based on the distributed interval, quantization may be performed on the first weight set to obtain the second weight set.

According to one embodiment, when the instructions are executed by the at least one processor, the electronic device distributes values of the sub-output data set to obtain the second weight set of the second data type. Using the size of the section, a quantization level for the second data type can be determined, and quantization on the first weight set can be performed according to the quantization level.

Methods according to embodiments described in the claims or specification of the present disclosure may be implemented in the form of hardware, software, or a combination of hardware and software.

When implemented as software, a computer-readable storage medium that stores one or more programs (software modules) may be provided. One or more programs stored in a computer-readable storage medium are configured for execution by one or more processors in an electronic device. One or more programs include instructions that cause the electronic device to execute methods according to embodiments described in the claims or specification of the present disclosure. The one or more programs may be included and provided in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. The computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store (e.g. Play Store™) or on two user devices (e.g. It can be distributed (e.g. downloaded or uploaded) directly between smart phones) or online. In the case of online distribution, at least a portion of the computer program product may be at least temporarily stored or temporarily created in a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

These programs (software modules, software) may include random access memory, non-volatile memory, including flash memory, read only memory (ROM), and electrically erasable programmable ROM. (electrically erasable programmable read only memory, EEPROM), magnetic disc storage device, compact disc-ROM (CD-ROM), digital versatile discs (DVDs), or other types of disk storage. It can be stored in an optical storage device or magnetic cassette. Alternatively, it may be stored in a memory consisting of a combination of some or all of these. Additionally, multiple configuration memories may be included.

In addition, the program may be distributed through a communication network such as the Internet, an intranet, a local area network (LAN), a wide area network (WAN), or a storage area network (SAN), or a combination thereof. It may be stored on an attachable storage device that is accessible. This storage device can be connected to a device performing an embodiment of the present disclosure through an external port. Additionally, a separate storage device on a communications network may be connected to the device performing embodiments of the present disclosure.

In the specific embodiments of the present disclosure described above, elements included in the disclosure are expressed in singular or plural numbers depending on the specific embodiment presented. However, singular or plural expressions are selected to suit the presented situation for convenience of explanation, and the present disclosure is not limited to singular or plural components, and even components expressed in plural may be composed of singular or singular. Even expressed components may be composed of plural elements.

According to embodiments, one or more of the components or operations described above may be omitted, or one or more other components or operations may be added. Alternatively or additionally, multiple components (eg, modules or programs) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component of the plurality of components prior to the integration. . According to embodiments, operations performed by a module, program, or other component may be executed sequentially, in parallel, iteratively, or heuristically, or one or more of the operations may be executed in a different order, omitted, or , or one or more other operations may be added.

Meanwhile, in the detailed description of the present disclosure, specific embodiments have been described, but various modifications may be made without departing from the scope of the present disclosure.

Claims

In an electronic device,

memory to store instructions; and

Contains at least one processor,

The instructions, when executed by the at least one processor, cause the electronic device to:

Obtain, from the memory, a first set of weights of a first data type included in a first operator, one of at least one operator included in the model;

Obtain, from profile information stored in the memory and generated by execution of the model, a set of sub-output data corresponding to a set of input data used for computation of the first operator;

Based on the set of sub-output data, obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set;

based on obtaining the second set of weights, causing to store the second set of weights in the memory.

Electronic devices.
In claim 1,

The instructions, when executed by the at least one processor, to obtain a set of weights of the second data type, cause the electronic device to:

From the first weight set of the first data type for expressing the floating point number of the first interval, through the quantization, the number of the second interval having a length smaller than the length of the first interval. causing to obtain the second set of weights of a second data type,

Electronic devices.
In claims 1 to 2,

The accelerator is,

Among the first data type or the second data type, configured to perform an operation based on the second data type,

Electronic devices.
In claims 1 to 3,

The instructions, when executed by the at least one processor, cause the electronic device to:

In response to a request to execute a function associated with the model, inputting the second set of weights, either the first set of weights or the second set of weights, to the accelerator, thereby causing the operation of the first operator to be performed. ,

Electronic devices.
In claims 1 to 4,

The instructions, when executed by the at least one processor, cause the electronic device to:

Based on obtaining the second set of weights,

identify a first set of output data obtained by performing an operation for the at least one operator from the profile information;

obtain a second set of output data from the set of input data by performing an operation for the at least one operator based on the second set of weights;

Based on a set of difference values between the first output data and the corresponding second output data, the first weight set included in the first operator included in the model is converted to the second weight set. causing replacement,

Electronic devices.
In claims 1 to 5,

Additionally comprising a communication circuit,

The instructions, when executed by the at least one processor, cause the electronic device to:

Based on identifying that at least one difference value of the set of difference values between the first output data and the corresponding second output data is greater than or equal to a reference value,

Transmitting the set of sub-output data obtained from the profile information to a server through the communication circuit,

Based on the sub-output data, receive a third weight set of the second data type on which quantization of the first weight set has been performed from a server through the communication circuit,

causing to store the received third set of weights in the memory,

Electronic devices.
In claims 1 to 6,

The instructions, when executed by the at least one processor, to obtain the second set of weights of the second data type, cause the electronic device to:

causing the second weight set to be obtained by performing quantization on the first weight set based on an interval in which the values of the set of sub-output data are distributed,

Electronic devices.
In claim 7,

The instructions, when executed by the at least one processor, to obtain the second set of weights of the second data type, cause the electronic device to:

Using the size of the section in which the values of the sub-output data set are distributed, determine a quantization level for the second data type,

According to the quantization level, cause to perform quantization on the first weight set,

Electronic devices.
In claim 8,

The range of numbers that can be expressed according to the second data type and the quantization level includes a section of the sub-output data set,

Electronic devices.
In claims 1 to 9,

The instructions, when executed by the at least one processor, cause the electronic device to:

causing the second set of weights to be stored in the memory, along with the hash value of the first set of weights,

Electronic devices.
In claims 1 to 10,

The instructions, when executed by the at least one processor, cause the electronic device to:

causing to perform an operation on first operators based on a second set of weights corresponding to a hash value of the first set of weights,

Electronic devices.
In claims 1 to 11,

The set of sub-output data of the first operator is a set of input data of a second operator connected to the first operator, one of the at least one operator,

Electronic devices.
In claims 1 to 12,

The instructions, when executed by the at least one processor, to obtain the first set of weights, cause the electronic device to:

identify the second data type supported by the accelerator;

Whether the first data type of the first weight set included in the first operator that is different from the second data type has been identified, and the weight set of the second data type obtained from the first weight set based on quantization is causing to determine whether to perform quantization on the first weight set of the first data type based on whether it is stored in memory, or whether a weight is included in the weight set of the second data type. ,

Electronic devices.
In a method performed by an electronic device (101),

Obtaining, in a memory, a first set of weights of a first data type included in a first operator, one of at least one operator included in the model;

Obtaining a set of sub-output data corresponding to a set of input data used for computation of the first operator from profile information stored in the memory and generated by execution of the model;

based on the set of sub-output data, performing quantization on the first set of weights to obtain a second set of weights of a second data type supported by at least one accelerator;

Based on obtaining the second set of weights, storing the second set of weights in the memory,

method.
In a computer readable storage medium storing one or more programs,

When the one or more programs are executed by the processor of the electronic device,

Obtain, from the memory, a first set of weights of a first data type included in a first operator, one of at least one operator included in the model;

Obtain, from profile information stored in the memory and generated by execution of the model, a set of sub-output data corresponding to a set of input data used for computation of the first operator;

Based on the set of sub-output data, obtain a second weight set of a second data type supported by at least one accelerator by performing quantization on the first weight set;

instructions causing the electronic device to store the second set of weights in the memory, based on obtaining the second set of weights.

A computer-readable storage medium.