TWI749331B - Memory with processing in memory architecture and operating method thereof - Google Patents
Memory with processing in memory architecture and operating method thereof Download PDFInfo
- Publication number
- TWI749331B TWI749331B TW108119618A TW108119618A TWI749331B TW I749331 B TWI749331 B TW I749331B TW 108119618 A TW108119618 A TW 108119618A TW 108119618 A TW108119618 A TW 108119618A TW I749331 B TWI749331 B TW I749331B
- Authority
- TW
- Taiwan
- Prior art keywords
- memory
- artificial intelligence
- core
- data
- special function
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Memory System (AREA)
Abstract
Description
本發明是有關於一種電路架構,且特別是有關於一種具有記憶體內運算(Processing In Memory, PIM)架構的記憶體及其操作方法。The present invention relates to a circuit architecture, and particularly relates to a memory with a Processing In Memory (PIM) architecture and an operation method thereof.
隨著人工智慧(Artificial Intelligence, AI)運算的演進,人工智慧運算的應用越來越廣泛,例如經由神經網路(Neural network)模型來進行影像(Image)資料分析、語音(Voice)資料分析、自然語言(Natural language)處理等神經網路運算。並且,隨著神經網路的運算複雜度越來越高,目前用於執行人工智慧運算的電腦設備已逐漸無法應付當前的神經網路運算需求,來提供有效且快速的運算性能。With the evolution of artificial intelligence (AI) computing, artificial intelligence computing has become more and more widely used, such as image data analysis, voice data analysis, and voice data analysis through neural network models. Neural network operations such as natural language processing. Moreover, as the computational complexity of neural networks is getting higher and higher, the current computer equipment used to perform artificial intelligence calculations has gradually been unable to cope with the current neural network calculation requirements to provide effective and fast calculation performance.
因此,目前已有專屬的處理核心被設計出來,以利用專屬的處理核心來進行神經網路運算。然而,雖然將神經網路運算獨立由專屬的處理核心執行可充分發揮處理核心的運算能力,但是專屬的處理核心的處理速度仍然受限於資料存取速度。由於專屬的處理核心與其他特殊功能處理核心經由相同的通用匯流排(Bus)來讀取記憶體的資料,因此在其他特殊功能處理核心占用通用匯流排的情況下,導致專屬的處理核心無法即時的取得執行人工智慧運算所需的資料。有鑑於此,如何設計一種能快速執行人工智慧運算的處理架構,以下將提出幾個實施例的解決方案。Therefore, at present, a dedicated processing core has been designed to use the dedicated processing core to perform neural network operations. However, although the neural network operations are independently executed by the dedicated processing cores to give full play to the computing capabilities of the processing cores, the processing speed of the dedicated processing cores is still limited by the data access speed. Since the exclusive processing core and other special function processing cores read the data of the memory through the same general bus (Bus), the exclusive processing core cannot be real-time when other special function processing cores occupy the general bus. To obtain the data needed to perform artificial intelligence operations. In view of this, how to design a processing architecture that can quickly execute artificial intelligence operations, the following will propose solutions in several embodiments.
本發明提供一種具有記憶體內運算架構的記憶體及其操作方法,可藉由整合在記憶體當中的人工智慧(Artificial Intelligence, AI)核心來直接讀取儲存在記憶體晶片當中的執行神經網路(Neural network)運算所需的資料,以實現快速地神經網路運算的功效。The present invention provides a memory with an in-memory arithmetic architecture and an operation method thereof, which can directly read the execution neural network stored in the memory chip by the artificial intelligence (AI) core integrated in the memory (Neural network) The data required for computing to achieve the effect of fast neural network computing.
本發明的具有記憶體內運算架構的記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。記憶體陣列包括多個記憶體區域。模式暫存器用以儲存多個記憶體模式設定。記憶體介面耦接記憶體陣列以及模式暫存器,並且外部耦接至特殊功能處理核心。人工智慧核心耦接記憶體陣列以及模式暫存器。所述多個記憶體區域依據模式暫存器的所述多個記憶體模式設定來分別選擇性地被定址於特殊功能處理核心以及人工智慧核心,以使特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory with an in-memory arithmetic architecture of the present invention includes a memory array, a pattern register, a memory interface, and an artificial intelligence core. The memory array includes a plurality of memory regions. The mode register is used to store multiple memory mode settings. The memory interface is coupled to the memory array and the mode register, and externally coupled to the special function processing core. The artificial intelligence core is coupled to the memory array and the pattern register. The plurality of memory areas are respectively selectively addressed to the special function processing core and the artificial intelligence core according to the plurality of memory mode settings of the mode register, so that the special function processing core and the artificial intelligence core are based on what The multiple memory mode settings are used to respectively access different memory areas in the memory array.
在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array via their own dedicated memory bus.
在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域。第一記憶體區域用以供人工智慧核心專屬存取。第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the present invention, the aforementioned plurality of memory regions includes a first memory region and a second memory region. The first memory area is used for exclusive access by the artificial intelligence core. The second memory area is used for exclusive access by the special function processing core.
在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域。人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the present invention, the above-mentioned multiple memory areas further include multiple data buffer areas. The artificial intelligence engine and the memory interface alternately access different data to the multiple data buffer areas.
在本發明的一實施例中,上述的當該人工智慧核心執行神經網路運算時,人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數,並且讀取第一記憶體區域的權重資料。人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core executes neural network operations as described above, the artificial intelligence core reads the input data of one of the plurality of data buffer areas as input parameters, and reads the first The weight data of a memory area. The artificial intelligence core outputs characteristic data to the first memory area.
在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數,並且讀取第一記憶體區域的另一權重資料。人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core executes the neural network operation, the artificial intelligence core reads the characteristic data of the first memory area as the next input parameter, and reads the data of the first memory area Another weight data. The artificial intelligence core outputs the next feature map data to one of the multiple data buffers to overwrite one of the multiple data buffers.
在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the aforementioned multiple data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first The memory area and one of the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes a second memory area and the other of the plurality of data buffer areas.
在本發明的一實施例中,上述的專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the aforementioned bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than the width of the external bus between the special function processing core and the memory interface.
在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於該人工智慧核心與所述多個記憶體區域之間的一匯流排的寬度大於或等於所述多個記憶體庫的一整列的資料數。In an embodiment of the present invention, the above-mentioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions respectively include a plurality of memory banks. The width of a bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in a whole row of the plurality of memory banks.
在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the present invention, the aforementioned memory is a dynamic random access memory chip.
本發明的具有記憶體內運算架構的記憶體操作方法適於一記憶體包括記憶體陣列、模式暫存器、記憶體介面以及人工智慧核心。所述方法包括以下步驟:依據模式暫存器的所述多個記憶體模式設定來分別選擇性地將記憶體中的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心;以及藉由特殊功能處理核心以及人工智慧核心依據所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域。The memory operation method with in-memory arithmetic architecture of the present invention is suitable for a memory including a memory array, a pattern register, a memory interface and an artificial intelligence core. The method includes the following steps: separately selectively addressing a plurality of memory regions in the memory to a special function processing core and an artificial intelligence core according to the plurality of memory mode settings of the mode register; and The special function processing core and the artificial intelligence core respectively access different memory areas in the memory array according to the multiple memory mode settings.
在本發明的一實施例中,上述的特殊功能處理核心以及人工智慧核心分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。In an embodiment of the present invention, the above-mentioned special function processing core and artificial intelligence core respectively access different memory areas of the memory array via their own dedicated memory bus.
在本發明的一實施例中,上述的所述多個記憶體區域包括第一記憶體區域以及第二記憶體區域,第一記憶體區域用以供人工智慧核心專屬存取,並且第二記憶體區域用以供特殊功能處理核心專屬存取。In an embodiment of the present invention, the above-mentioned plurality of memory areas includes a first memory area and a second memory area, the first memory area is used for exclusive access by the artificial intelligence core, and the second memory area The body area is used for exclusive access by the special function processing core.
在本發明的一實施例中,上述的所述多個記憶體區域更包括多個資料緩衝區域,並且人工智慧引擎以及記憶體介面交替地至所述多個資料緩衝區域存取不同資料。In an embodiment of the present invention, the aforementioned multiple memory areas further include multiple data buffer areas, and the artificial intelligence engine and the memory interface alternately access different data to the multiple data buffer areas.
在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟包括:藉由人工智慧核心讀取所述多個資料緩衝區域的其中之一的輸入資料作為輸入參數;藉由人工智慧核心讀取該第一記憶體區域的權重資料;以及藉由人工智慧核心輸出特徵資料至第一記憶體區域。In an embodiment of the present invention, when the artificial intelligence core executes neural network operations, the special function processing core and the artificial intelligence core store each according to the multiple memory mode settings of the mode register. The step of obtaining different memory areas in the memory array includes: reading the input data of one of the plurality of data buffer areas by an artificial intelligence core as an input parameter; and reading the first memory by the artificial intelligence core Weight data of the body area; and output characteristic data to the first memory area through the artificial intelligence core.
在本發明的一實施例中,上述的當人工智慧核心執行神經網路運算時,其中藉由特殊功能處理核心以及人工智慧核心依據模式暫存器的所述多個記憶體模式設定來分別存取記憶體陣列中的不同記憶體區域的步驟更包括:藉由人工智慧核心讀取第一記憶體區域的特徵資料作為下一輸入參數;藉由人工智慧核心讀取第一記憶體區域的另一權重資料;以及藉由人工智慧核心輸出下一特徵圖資料至所述多個資料緩衝區的其中之一,以覆寫所述多個資料緩衝區的其中之一。In an embodiment of the present invention, when the artificial intelligence core executes neural network operations, the special function processing core and the artificial intelligence core store each according to the multiple memory mode settings of the mode register. The step of obtaining different memory areas in the memory array further includes: reading the characteristic data of the first memory area by the artificial intelligence core as the next input parameter; and reading the other memory area of the first memory area by the artificial intelligence core A weighting data; and outputting the next feature map data to one of the plurality of data buffers by the artificial intelligence core to overwrite one of the plurality of data buffers.
在本發明的一實施例中,上述的所述多個資料緩衝區域分別可交替地被定址於特殊功能處理核心以及人工智慧核心,以使對應於人工智慧核心的第一記憶體空間包括第一記憶體區域以及所述多個資料緩衝區域的其中之一,並且對應於特殊功能處理核心的第二記憶體空間包括第二記憶體區域以及所述多個資料緩衝區域的其中之另一。In an embodiment of the present invention, the aforementioned multiple data buffer areas can be alternately addressed to the special function processing core and the artificial intelligence core, so that the first memory space corresponding to the artificial intelligence core includes the first The memory area and one of the plurality of data buffer areas, and the second memory space corresponding to the special function processing core includes a second memory area and the other of the plurality of data buffer areas.
在本發明的一實施例中,上述的專屬於該人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於特殊功能處理核心與記憶體介面之間的外部匯流排的寬度。In an embodiment of the present invention, the width of the aforementioned bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than the width of the external bus between the special function processing core and the memory interface.
在本發明的一實施例中,上述的所述多個記憶體區域分別對應於多個列緩衝區塊,並且所述多個記憶體區域各別包括多個記憶體庫。專屬於人工智慧核心與所述多個記憶體區域之間的匯流排的寬度大於或等於所述多個記憶體庫的整列的資料數。In an embodiment of the present invention, the above-mentioned plurality of memory regions respectively correspond to a plurality of column buffer blocks, and the plurality of memory regions respectively include a plurality of memory banks. The width of the bus dedicated to the artificial intelligence core and the plurality of memory regions is greater than or equal to the number of data in the entire row of the plurality of memory banks.
在本發明的一實施例中,上述的該記憶體為動態隨機存取記憶體晶片。In an embodiment of the present invention, the aforementioned memory is a dynamic random access memory chip.
基於上述,本發明的記憶體及其操作方法,可使外部的特殊功能處理核心以及設置在記憶體當中的人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,本發明的記憶體可快速地執行神經網路運算。Based on the above, the memory and the operating method of the present invention enable the external special function processing core and the artificial intelligence core set in the memory to simultaneously access different memory areas in the memory array. Therefore, the memory of the present invention can quickly perform neural network operations.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.
為了使本發明之內容可以被更容易明瞭,以下特舉實施例做為本發明確實能夠據以實施的範例。另外,凡可能之處,在圖式及實施方式中使用相同標號的元件/構件/步驟,係代表相同或類似部件。In order to make the content of the present invention more comprehensible, the following embodiments are specifically cited as examples on which the present invention can indeed be implemented. In addition, wherever possible, elements/components/steps with the same reference numbers in the drawings and embodiments represent the same or similar parts.
圖1是繪製本發明的一實施例的記憶體的方塊示意圖。參考圖1,記憶體100包括記憶體陣列110、模式暫存器120、人工智慧(Artificial Intelligence, AI)核心130以及記憶體介面140。記憶體陣列110耦接人工智慧核心130以及記憶體介面140。模式暫存器(Mode register)120耦接記憶體陣列110、人工智慧核心130以及記憶體介面140。記憶體陣列110包括多個記憶體區域。所述多個記憶體區域各別用以儲存特定資料(或稱資料集(Dataset))。並且,在一實施例中,記憶體100還可進一步包括多個專屬記憶體控制單元。所述多個專屬記憶體控制單元以一對一地對應於所述多個記憶體區域,來分別執行資料存取動作。在本實施例中,記憶體介面140可外部耦接至特殊功能處理核心。並且,所述多個記憶體區域依據記錄在模式暫存器120當中的多個記憶體模式設定的來分別選擇性地被定址(Addressing)於特殊功能處理核心以及人工智慧核心130,以使特殊功能處理核心以及人工智慧核心130可依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。並且,本實施例的記憶體100具有執行人工智慧運算的能力。FIG. 1 is a schematic block diagram of a memory according to an embodiment of the present invention. 1, the
在本實施例中,記憶體100可為動態隨機存取記憶體(Dynamic Random Access Memory, DRAM)晶片,並且可例如是由控制邏輯、運算邏輯以及快取(Cache)單元等諸如此類的電路元件所建構而成的記憶體內運算(Processing In Memory, PIM)架構。人工智慧核心130可整合在記憶體100的周邊電路區域當中,以直接透過專屬的記憶體控制器以及專屬的匯流排(Bus)來存取記憶體陣列110的多個記憶體庫(Memory bank)。並且,人工智慧核心130可預先設計以具有執行特定的神經網路(Neural network)運算的功能及特性。換言之,本實施例的記憶體100具有執行人工智慧運算的功能,並且人工智慧核心130以及外部的特殊功能處理核心可同時存取記憶體陣列110,以提供高效率的資料存取以及運算效果。In this embodiment, the
在本實施例中,所述特殊功能處理核心可例如是中央處理單元(Central Processing Unit, CPU)核心、影像信號處理器(Image Signal Processor, ISP)核心、數位信號處理器(Digital Signal Processor, DSP)核心、繪圖處理器(Graphics Processing Unit, GPU)核心或其他類似特殊功能處理核心。在本實施例中,特殊功能處理核心經由通用的匯流排(或標準匯流排)耦接至記憶體介面140,以經由記憶體介面140存取記憶體陣列110。對此,人工智慧核心130是經由記憶體內部的專屬匯流排來存取記憶體陣列110,因此不受限於記憶體介面140的寬度或速度,並且人工智慧核心130可依據特定的資料存取模式來快速地存取記憶體陣列130。In this embodiment, the special function processing core may be, for example, a central processing unit (Central Processing Unit, CPU) core, an image signal processor (Image Signal Processor, ISP) core, or a digital signal processor (Digital Signal Processor, DSP) core. ) Core, Graphics Processing Unit (GPU) core or other similar special function processing core. In this embodiment, the special function processing core is coupled to the
圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖2,記憶體200包括記憶體區域211、213、列緩衝區塊212、214、模式暫存器220、人工智慧核心230以及記憶體介面240。在本實施例中,模式暫存器220耦接人工智慧核心230以及記憶體介面240,以分別提供多個記憶體模式設定至人工智慧核心230以及記憶體介面240。人工智慧核心230以及記憶體介面240各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域211、213以及列緩衝區塊212、214。記憶體區域211、213個別包括多個記憶體庫。記憶體區域211、213可為資料緩衝區域。在本實施例中,記憶體介面240外部耦接至另一記憶體介面340。記憶體介面340例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。FIG. 2 is a schematic diagram illustrating the architecture of a memory and a plurality of special function processing cores according to an embodiment of the present invention. 2, the
在本實施例中,當中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需要存取列緩衝區塊212或列緩衝區塊214時,中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353需經由記憶體介面240、340依順序或依隊列來存取列緩衝區塊212或列緩衝區塊214。然而,無論上述的各種特殊功能處理核心的當前存取記憶體陣列的情況為何,人工智慧核心230可同時存取在記憶體陣列中的不同記憶體區域。在一實施例中,記憶體區域211或記憶體區域213可例如適用於存取執行神經網路運算或其他機器學習運算所需的數位化輸入資料、權重(Weight)資料或特徵圖(Feature map)資料等。In this embodiment, when the central
值得注意的是,上述的各種特殊功能處理核心以及人工智慧核心230是分別經由各自專屬的記憶體匯流排來同時存取記憶體陣列的不同記憶體區域。也就是說,當上述的各種特殊功能處理核心經由列緩衝區塊212存取記憶體區域211當中的資料時,人工智慧核心230可經由列緩衝區塊214存取記憶體區域213當中的資料。並且,當上述的各種特殊功能處理核心經由列緩衝區塊214存取記憶體區域213當中的資料時,人工智慧核心230可經由列緩衝區塊212存取記憶體區域211當中的資料。換言之,上述的各種特殊功能處理核心以及人工智慧核心230可交替地至作為資料緩衝區域的記憶體區域211、213存取不同資料。此外,在一實施例中,人工智慧核心230還可進一步包括多個快取(Cache)或佇列(Queue),並且人工智慧核心230可透過所述多個快取或所述多個佇列以管線式(Pipeline)的方式來快速存取記憶體區域211或記憶體區域213當中的資料。It is worth noting that the above-mentioned various special function processing cores and
圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。參考圖3,本實施例的處理器400包括記憶體區域411、413、415、417、列緩衝區塊412、414、416、418、模式暫存器420、人工智慧核心430以及記憶體介面440。在本實施例中,模式暫存器420耦接人工智慧核心430以及記憶體介面440,以分別提供多個記憶體模式設定至人工智慧核心430以及記憶體介面440。記憶體介面440例如經由匯流排耦接至中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。在本實施例中,人工智慧核心430以及記憶體介面440各自獨立運作,以分別存取記憶體陣列。記憶體陣列包括記憶體區域411、413、415、417以及列緩衝區塊412、414、416、418,並且記憶體區域411、413、415、417各別包括多個記憶體庫。FIG. 3 is a schematic diagram illustrating the structure of a memory and a plurality of special function processing cores according to another embodiment of the present invention. 3, the
在本實施例中,記憶體區域413、415可為資料緩衝區域。記憶體區域411供上述的各種特殊功能處理核心專屬存取,其中所述各種特殊功能處理核心可例如是中央處理單元核心351、繪圖處理器核心352以及數位信號處理器核心353。記憶體區域417供人工智慧核心430專屬存取。也就是說,當上述的各種特殊功能處理核心與人工智慧核心430分別專屬存取記憶體區域411以及記憶體區域417時,上述的各種特殊功能處理核心與人工智慧核心430之間不會互相影響存取動作。舉例而言,以執行神經網路運算為例,記憶體區域417的多個記憶體庫的一整列可例如儲存權重資料的多個權重值。人工智慧核心430可透過列緩衝區塊418來依序且交錯地讀取專屬於人工智慧核心430的記憶體區域417的所述多個記憶體庫的每一列,以快速地取得執行神經網路運算所需的資料。In this embodiment, the
圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。請參考圖3、圖4A以及圖4B。以下將以對多個影像資料連續執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的一種存取方式。人工智慧核心430所執行的人工智慧運算可例如是深度學習網路(Deep Neural Networks, DNN)運算、卷積神經網路(Convolutional Neural Networks, CNN)運算或循環神經網路(Recurrent Neural Network, RNN)運算等,本發明並不加以限制。在一實施情境中,記憶體區域417包括子記憶體區域417_1、417_2。子記憶體區域417_1例如用於儲存具有多個權重值的權重資料,並且子記憶體區域417_2例如用於儲存具有多個特徵值的特徵圖資料。在此一實施情境中,記憶體區域413例如被定址於特殊功能處理核心354,並且記憶體區域415例如被定址於人工智慧核心430。特殊功能處理核心354可例如是圖3的中央處理單元核心351、繪圖處理器核心352或數位信號處理器核心353。因此,如圖4A所示,對應於特殊功能處理核心354的記憶體空間450包括記憶體區域411、413,並且對應於人工智慧核心430的記憶體空間460包括記憶體區域415、417。4A and 4B are schematic diagrams illustrating the swap addressing of different memory blocks in different memory spaces according to an embodiment of the present invention. Please refer to Figure 3, Figure 4A and Figure 4B. Hereinafter, a method of accessing the
在此實施情境中,假設特殊功能處理核心354即圖3的數位信號處理器核心353,因此記憶體區域415可儲存有由數位信號處理器核心353先前儲存的數位化輸入資料,例如影像資料。人工智慧核心430可例如執行神經網路運算,以對儲存在記憶體區域415當中的當前影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417的權重資料,並且讀取記憶體區域415的影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域413儲存下一個影像資料。In this implementation scenario, it is assumed that the special
接著,當記憶體區域415的影像資料經由人工智慧核心430辨識完成後,透過設定模式暫存器420,可交換記憶體區域413、415的被定址對象,以交換記憶體區域413、415所處的記憶體空間。因此,記憶體區域413、415經由定址交換後,如圖4B所示,對應於數位信號處理器核心353的記憶體空間450’包括記憶體區域411、415,並且對應於人工智慧核心430的記憶體空間460’包括記憶體區域413、417。此時,人工智慧核心430可接續執行神經網路運算,以對儲存在記憶體區域413當中的新一個影像資料進行影像辨識。人工智慧核心430可經由專屬匯流排來讀取記憶體區域417-1的權重資料,並且讀取記憶體區域413的下一個影像資料作為神經網路運算所需的輸入參數,以進行神經網路運算。同時,數位信號處理器核心353可經由記憶體介面340、440對記憶體區域415進行覆寫,以儲存下下一個影像資料至記憶體區域415。據此,本實施例的記憶體400可提供高效率的資料存取操作,並且記憶體400可實現具有高速執行效果的神經網路運算。Then, after the image data in the
圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。請參考圖3、圖5A以及圖5B。以下將以對影像資料執行神經網路運算為例並且搭配圖4A以及圖4B來說明記憶體400的另一種存取方式。在上述情境中,在神經網路運算的輸入層階段,對應於人工智慧核心430的記憶體空間550可例如包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430可讀取記憶體區域415,以取得輸入資料,並作為輸入參數。記憶體區域415儲存有由數位信號處理器核心353先前儲存的影像資料。並且,人工智慧核心430讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生特徵圖資料,並且人工智慧核心430將特徵圖資料儲存至子記憶體區域417_2。5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the present invention. Please refer to Figure 3, Figure 5A and Figure 5B. Hereinafter, a neural network operation performed on image data will be taken as an example, and another method of accessing the
接著,在神經網路運算的下一隱藏層階段,對應於人工智慧核心430的記憶體空間550’包括記憶體區域415、子記憶體區域417_1、417_2。人工智慧核心430讀取前次儲存在子記憶體區域417_2的特徵圖資料,以作為當前隱藏層的輸入參數,並且讀取子記憶體區域417_1的權重資料。因此,人工智慧核心430依據輸入參數以及權重資料執行神經網路運算,以產生新的特徵圖資料,並且人工智慧核心430將新的特徵圖資料複寫至記憶體區域415。換言之,被定址於人工智慧核心430的記憶體區域不變,但是人工智慧核心430的讀取及儲存目標位址交換。以此類推,本實施例的人工智慧核心430可利用記憶體區域415以及子記憶體區域417_2來輪替地讀取先前產生的特徵圖資料以及儲存人工智慧核心430在當前進行神經網路運算的過程中所產生的當前特徵圖資料。由於各記憶體區域有其獨立匯流排,因此本實施例的人工智慧核心430可快速地取得輸入資料以及權重資料,並且快速地進執行神經網路運算並儲存輸出資料。Then, in the next hidden layer stage of neural network operation, the memory space 550' corresponding to the
圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。參考圖6,本實施例的記憶體操作方法可至少適用於圖1的記憶體100,以使記憶體100執行步驟S610、S620。記憶體100的記憶體介面140可外部耦接至特殊功能處理核心。在步驟S610中,依據模式暫存器120的多個記憶體模式設定來分別選擇性地將記憶體陣列110的多個記憶體區域被定址於特殊功能處理核心以及人工智慧核心130的記憶體空間。在步驟S620中,特殊功能處理核心以及人工智慧核心130依據所述多個記憶體模式設定來分別存取記憶體陣列110中的不同記憶體區域。因此,本實施例的記憶體操作方法可使記憶體100可同時供特殊功能處理核心以及人工智慧核心130進行存取,以提供高效率的記憶體運作效果。FIG. 6 is a flowchart of a memory operation method according to an embodiment of the present invention. Referring to FIG. 6, the memory operation method of this embodiment can be at least applicable to the
另外,關於本實施例的記憶體100的相關內部元件、實施方式以及技術細節,可參考上述圖1至圖5B實施例的說明而獲致足夠的教示、建議以及實施說明,因此不再贅述。In addition, with regard to the relevant internal components, implementations, and technical details of the
綜上所述,本發明的記憶體及其操作方法,可藉由模式暫存器設計有多個特定記憶體模式設定,以使記憶體陣列的多個記憶體區域可依據所述多個特定記憶體模式設定來分別選擇性地被定址於外部的特殊功能處理核心以及人工智慧核心,以使外部的特殊功能處理核心以及人工智慧核心可同時存取記憶體陣列中的不同記憶體區域。因此,設置在記憶體當中的人工智慧核心可快速地執行神經網路運算。In summary, the memory and operation method of the present invention can be designed with multiple specific memory mode settings through the mode register, so that multiple memory regions of the memory array can be based on the multiple specific The memory mode is set to be selectively addressed to the external special function processing core and artificial intelligence core respectively, so that the external special function processing core and artificial intelligence core can simultaneously access different memory areas in the memory array. Therefore, the artificial intelligence core set in the memory can quickly perform neural network operations.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be subject to those defined by the attached patent scope.
100、200、400:記憶體
110:記憶體陣列
120、220、420:模式暫存器
130、230、430:人工智慧核心
140、240、440:記憶體介面
211、213、411、413、415、417:記憶體區域
212、214、412、414、416、418:列緩衝區塊
340:記憶體介面
351:中央處理單元核心
352:繪圖處理器核心
353:數位信號處理器核心
354:特殊功能處理核心
417_1、417_2:子記憶體區域
450、450’、460、460’、550、550’:記憶體空間
S610、S620:步驟100, 200, 400: memory
110:
圖1是繪製本發明的一實施例的記憶體的方塊示意圖。 圖2是繪製本發明的一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖3是繪製本發明的另一實施例的記憶體與多個特殊功能處理核心的架構示意圖。 圖4A以及圖4B是繪製本發明的一實施例的不同記憶體空間當中的不同記憶體區塊的交換定址的示意圖。 圖5A以及圖5B是繪製本發明的一實施例的同一記憶體空間的不同記憶體區塊的交換存取的示意圖。 圖6是繪製本發明的一實施例的記憶體操作方法的流程圖。FIG. 1 is a schematic block diagram of a memory according to an embodiment of the present invention. FIG. 2 is a schematic diagram illustrating the architecture of a memory and a plurality of special function processing cores according to an embodiment of the present invention. FIG. 3 is a schematic diagram illustrating the structure of a memory and a plurality of special function processing cores according to another embodiment of the present invention. 4A and 4B are schematic diagrams illustrating the swap addressing of different memory blocks in different memory spaces according to an embodiment of the present invention. 5A and 5B are schematic diagrams illustrating the swap access of different memory blocks in the same memory space according to an embodiment of the present invention. FIG. 6 is a flowchart of a memory operation method according to an embodiment of the present invention.
100:記憶體 100: memory
110:記憶體陣列 110: memory array
120:模式暫存器 120: Mode register
130:人工智慧核心 130: Artificial Intelligence Core
140:記憶體介面 140: memory interface
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/563,956 US10990524B2 (en) | 2018-10-11 | 2019-09-09 | Memory with processing in memory architecture and operating method thereof |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862744140P | 2018-10-11 | 2018-10-11 | |
US62/744,140 | 2018-10-11 | ||
US201862785234P | 2018-12-27 | 2018-12-27 | |
US62/785,234 | 2018-12-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202014895A TW202014895A (en) | 2020-04-16 |
TWI749331B true TWI749331B (en) | 2021-12-11 |
Family
ID=70231709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW108119618A TWI749331B (en) | 2018-10-11 | 2019-06-06 | Memory with processing in memory architecture and operating method thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111047029B (en) |
TW (1) | TWI749331B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240363161A1 (en) * | 2023-04-26 | 2024-10-31 | Macronix International Co., Ltd. | Electronic device and method for operating the same |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010052061A1 (en) * | 1999-10-04 | 2001-12-13 | Storagequest Inc. | Apparatus And Method For Managing Data Storage |
CN107402901A (en) * | 2016-05-20 | 2017-11-28 | 三星电子株式会社 | The storage device shared by two or more processors and include its system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100816053B1 (en) * | 2006-11-21 | 2008-03-21 | 엠텍비젼 주식회사 | Memory device, memory system and dual port memory device with self-copy function |
US8719516B2 (en) * | 2009-10-21 | 2014-05-06 | Micron Technology, Inc. | Memory having internal processors and methods of controlling memory access |
CN105654419A (en) * | 2016-01-25 | 2016-06-08 | 上海华力创通半导体有限公司 | Operation processing system and operation processing method of image |
CN116842306A (en) * | 2016-03-23 | 2023-10-03 | Gsi 科技公司 | In-memory matrix multiplication and use thereof in neural networks |
-
2019
- 2019-06-06 TW TW108119618A patent/TWI749331B/en active
- 2019-06-24 CN CN201910547680.1A patent/CN111047029B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010052061A1 (en) * | 1999-10-04 | 2001-12-13 | Storagequest Inc. | Apparatus And Method For Managing Data Storage |
CN107402901A (en) * | 2016-05-20 | 2017-11-28 | 三星电子株式会社 | The storage device shared by two or more processors and include its system |
Also Published As
Publication number | Publication date |
---|---|
CN111047029A (en) | 2020-04-21 |
CN111047029B (en) | 2023-04-18 |
TW202014895A (en) | 2020-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10990524B2 (en) | Memory with processing in memory architecture and operating method thereof | |
US11294599B1 (en) | Registers for restricted memory | |
TWI766396B (en) | Data temporary storage apparatus, data temporary storage method and operation method | |
JP6912535B2 (en) | Memory chips capable of performing artificial intelligence operations and their methods | |
JP6335335B2 (en) | Adaptive partition mechanism with arbitrary tile shapes for tile-based rendering GPU architecture | |
US11657119B2 (en) | Hardware accelerated convolution | |
US11645533B2 (en) | IR drop prediction with maximum convolutional neural network | |
KR20200108774A (en) | Memory Device including instruction memory based on circular queue and Operation Method thereof | |
WO2019118363A1 (en) | On-chip computational network | |
TW202127461A (en) | Concurrent testing of a logic device and a memory device within a system package | |
WO2023124304A1 (en) | Chip cache system, data processing method, device, storage medium, and chip | |
TWI749331B (en) | Memory with processing in memory architecture and operating method thereof | |
CN109117415B (en) | Data sharing system and data sharing method thereof | |
JP2018120548A (en) | Processor, information processing apparatus, and operation method of processor | |
TWI714003B (en) | Memory chip capable of performing artificial intelligence operation and method thereof | |
Zhou et al. | Hygraph: Accelerating graph processing with hybrid memory-centric computing | |
JP2023527770A (en) | Inference in memory | |
US9189448B2 (en) | Routing image data across on-chip networks | |
US20240168639A1 (en) | Efficient reduce-scatter via near-memory computation | |
US12131775B2 (en) | Keeper-free volatile memory system | |
JP4071930B2 (en) | Synchronous DRAM | |
CN112035056A (en) | Parallel RAM access architecture and access method based on multiple computing units | |
US20240201990A1 (en) | Fused Data Generation and Associated Communication | |
US20240220409A1 (en) | Unified flexible cache | |
CN118295707B (en) | Data handling method for fast registers, processor and computing device |