CN112925739A - Communication method applied to many-core chip, many-core chip and storage medium - Google Patents
Communication method applied to many-core chip, many-core chip and storage medium Download PDFInfo
- Publication number
- CN112925739A CN112925739A CN202110295620.2A CN202110295620A CN112925739A CN 112925739 A CN112925739 A CN 112925739A CN 202110295620 A CN202110295620 A CN 202110295620A CN 112925739 A CN112925739 A CN 112925739A
- Authority
- CN
- China
- Prior art keywords
- cluster
- core
- time sequence
- functional
- primitive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 100
- 238000004891 communication Methods 0.000 title claims abstract description 78
- 230000004044 response Effects 0.000 claims abstract description 162
- 230000005540 biological transmission Effects 0.000 claims abstract description 82
- 238000004364 calculation method Methods 0.000 claims abstract description 44
- 230000003068 static effect Effects 0.000 claims description 49
- 230000001360 synchronised effect Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 abstract description 15
- 238000005516 engineering process Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 59
- 238000010586 diagram Methods 0.000 description 26
- 238000012546 transfer Methods 0.000 description 13
- 230000003993 interaction Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 6
- 239000000835 fiber Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
Abstract
The present disclosure relates to the field of communications technologies, and in particular, to a communication method applied to a many-core chip, and a storage medium. The method is applied to a many-core chip, the many-core chip at least comprises a first time sequence cluster and a second time sequence cluster, the first time sequence cluster and the second time sequence cluster are asynchronous in a working period, a functional core in the first time sequence cluster sends an instant primitive request to a functional core in the second time sequence cluster, and the instant primitive request is used for indicating the functional core in the second time sequence cluster to return an instant primitive response after the current working period of the second time sequence cluster; and after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster. The embodiment of the disclosure realizes data transmission between asynchronous time sequence clusters in the many-core chip by transmitting the immediate primitive request and the response mechanism, reduces the overall waiting time in the many-core chip, improves the calculation efficiency and the execution speed, and improves the overall operation performance of the many-core chip.
Description
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a communication method applied to a many-core chip, and a storage medium.
Background
With the development of artificial intelligence technology, the requirement of users on the processing capacity of chips is higher and higher, and the use of many-core chips is wider and wider due to the limited processing capacity of single-core chips. Processing power is increased by distributing computational tasks to a combination of cores for parallel execution. In many-core chips, not only the computation tasks assigned in each core need to be performed, but also data transmission between cores needs to be performed. The execution mechanism of the intra-core computation task and the inter-core communication task in the many-core chip influences the overall operation performance of the many-core chip.
How to improve the overall operation performance of many-core chips has not provided a reasonable and effective implementation manner in the related art.
Disclosure of Invention
In view of the above, the present disclosure provides a communication method applied to a many-core chip, a many-core chip and a storage medium. The technical scheme comprises the following steps:
according to an aspect of the present disclosure, there is provided a communication method applied to a many-core chip, the many-core chip including at least a first timing cluster and a second timing cluster, the first timing cluster and the second timing cluster being asynchronous in a duty cycle, the method comprising:
the functional core in the first time sequence cluster sends an instant primitive request to the functional core in the second time sequence cluster, wherein the instant primitive request is used for indicating the functional core in the second time sequence cluster to return an instant primitive response after the current working cycle of the second time sequence cluster;
and after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster.
In one possible implementation manner, sending an immediate primitive request from a functional core in the first timing cluster to a functional core in the second timing cluster includes:
a functional core in the first time sequence cluster acquires a static primitive instruction of a first working period, wherein the static primitive instruction is used for indicating a pre-configured inter-core communication task, and the first working period is the current working period of the first time sequence cluster;
and the functional core in the first time sequence cluster sends the immediate primitive request to the functional core in the second time sequence cluster in the first working period according to the inter-core communication task.
In another possible implementation manner, after the sending, by the functional core in the first timing cluster, the immediate primitive request to the functional core in the second timing cluster, the method further includes:
after receiving the immediate primitive request in a second working cycle, a functional core in the second time sequence cluster stores the immediate primitive request in a cache, wherein the second working cycle is the current working cycle of the second time sequence cluster, and the second working cycle is different from the first working cycle;
and the functional core in the second time sequence cluster returns an instant primitive response corresponding to the instant primitive request after the second working period.
In another possible implementation manner, the returning of the immediate primitive response corresponding to the immediate primitive request by the functional core in the second time-sequential cluster after the second working period includes:
after executing the specified number of static primitive instructions or after the target working period after the second working period, the functional core in the second time sequence cluster returns the immediate primitive response corresponding to the immediate primitive request;
and the difference value of the number of the periods between the target working period and the second working period is a preset threshold value.
In another possible implementation manner, after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster, including:
and after receiving the immediate primitive response, the functional core in the first time sequence cluster sends data to be sent to the functional core in the second time sequence cluster according to the static primitive instruction in the first working period, and/or receives data sent by the functional core in the second time sequence cluster.
In another possible implementation manner, before the sending, by the functional core in the first timing cluster, the immediate primitive request to the functional core in the second timing cluster, the method further includes:
dividing a plurality of functional cores in the many-core chip according to a total calculation task to be executed to obtain a plurality of time sequence clusters with asynchronous work cycles, wherein the plurality of time sequence clusters at least comprise the first time sequence cluster and the second time sequence cluster;
wherein each of the timing clusters includes one or more functional cores having synchronized duty cycles, the functional cores being configured to perform a portion of the computational overall task.
In another possible implementation manner, the second timing cluster includes a plurality of functional cores, and after the functional core in the first timing cluster sends the immediate primitive request to the functional core in the second timing cluster, the method further includes:
after receiving the immediate primitive request sent by the functional core in the first time sequence cluster, any one functional core in the second time sequence cluster sends the immediate primitive request to other functional cores in the second time sequence cluster in a multicast mode in the current work cycle of the second time sequence cluster.
In another possible implementation manner, the method further includes:
for each functional core in other functional cores in the second time-series cluster, returning a multicast response after receiving the immediate primitive request of multicast;
and when each functional core in the second time sequence cluster completes the total task of the calculation of the current working period of the second time sequence cluster, and other functional cores in the second time sequence cluster all return the multicast response, ending the current working period of the second time sequence cluster.
In another possible implementation manner, the first timing cluster includes a plurality of functional cores, and the method further includes:
and after receiving the immediate primitive response, any functional core in the first time sequence cluster sends the immediate primitive response to other functional cores in the first time sequence cluster in a multicast mode.
According to another aspect of the present disclosure, there is provided a many-core chip comprising at least a first timing cluster and a second timing cluster, the first timing cluster and the second timing cluster being asynchronous in a duty cycle, the many-core chip comprising:
the function core in the first time sequence cluster is used for sending an instant primitive request to the function core in a second time sequence cluster, and the instant primitive request is used for indicating the function core in the second time sequence cluster to return an instant primitive response after the current work cycle of the second time sequence cluster;
and the functional core in the first time sequence cluster is further configured to perform data transmission with the functional core in the second time sequence cluster after receiving the immediate primitive response.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
The embodiment of the disclosure provides a communication method applied to a many-core chip, wherein the many-core chip at least comprises a first time sequence cluster and a second time sequence cluster, the first time sequence cluster and the second time sequence cluster are asynchronous in a working period, a functional core in the first time sequence cluster sends an instant primitive request to a functional core in the second time sequence cluster, and the instant primitive request is used for indicating that the functional core in the second time sequence cluster returns an instant primitive response after the current working period of the second time sequence cluster; after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster; the method and the device avoid the situation that the many-core chip executes the local computing task of each core and the data transmission among the cores according to the globally synchronous time sequence in the related technology, realize the data transmission among asynchronous time sequence clusters in the many-core chip by transmitting the instant primitive request and the response mechanism, reduce the overall waiting time in the many-core chip, improve the computing efficiency and the execution speed, and improve the overall operation performance of the many-core chip.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 illustrates a flow chart of a communication method applied to a many-core chip provided by an exemplary embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a communication method applied to a many-core chip provided by another example embodiment of the present disclosure;
FIG. 3 is a diagram illustrating a time-series cluster partitioning method involved in a communication method applied to a many-core chip according to an exemplary embodiment of the disclosure;
FIG. 4 shows a flow diagram of a communication method applied to a many-core chip provided by another example embodiment of the present disclosure;
FIG. 5 shows a schematic diagram of a communication method applied to a many-core chip according to an exemplary embodiment of the disclosure;
FIG. 6 shows a flow diagram of a communication method applied to a many-core chip provided by another example embodiment of the present disclosure;
FIG. 7 shows a schematic diagram of a communication method applied to a many-core chip according to another exemplary embodiment of the disclosure;
FIG. 8 shows a flow diagram of a communication method applied to a many-core chip provided by another example embodiment of the present disclosure;
FIG. 9 shows a schematic diagram of a communication method applied to a many-core chip according to another exemplary embodiment of the disclosure;
FIG. 10 shows a flow diagram of a communication method applied to a many-core chip provided by another example embodiment of the present disclosure;
FIG. 11 shows a schematic diagram of a communication method applied to a many-core chip according to another exemplary embodiment of the present disclosure;
FIG. 12 shows a flow diagram of a communication method applied to a many-core chip provided by another example embodiment of the present disclosure;
FIG. 13 shows a schematic diagram of a communication method applied to a many-core chip according to another exemplary embodiment of the present disclosure;
FIG. 14 shows a flow diagram of a communication method applied to a many-core chip, provided by another example embodiment of the present disclosure;
FIG. 15 shows a schematic diagram of a communication method applied to a many-core chip according to another exemplary embodiment of the present disclosure;
fig. 16 shows a schematic structure diagram of a many-core chip according to an exemplary embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
In many-core chips, the execution mechanism of the intra-core computation task and the inter-core communication task affects the overall operation performance of the many-core chip. In order to solve the above problem, a Bulk Synchronization Parallel (BSP) mechanism is provided in the related art. In this mechanism, the basic operation of many-core chips mainly includes three phases: 1. in the local calculation stage, each core only performs local calculation on the data stored in the local memory. 2. And in the global communication stage, each core operates on non-local data, including inter-core data transmission. 3. And a barrier synchronization stage, which waits for the end of all communication behaviors.
There is also another implementation mechanism provided in the related art, in which the execution of many-core chips is divided into multiple synchronous work cycles by a global synchronization signal. In each working period, each core simultaneously executes a local computing task and an inter-core communication task, and each core transmits an obtained computing result while performing local computing.
However, in the above method, each core in the many-core chip needs to execute the local computation task and the inter-core communication task according to the globally synchronous work cycle. When the computation tasks required to be performed or the data amount required to be transmitted in different cores are not matched, idle running or waiting of a part of cores is caused, and the utilization rate and the execution efficiency of the many-core chip are reduced. In the first method, the local computation task and the inter-core communication task of the many-core chip are executed in a serial manner, which further reduces the execution efficiency.
Therefore, different cores in a many-core chip may have different computational requirements, and waiting for global synchronization signals for cycle switching and data transmission may result in idle and wait of some functional cores. Moreover, data exchange between cores may occur at different times, and data transmission may be performed according to a synchronization cycle, which may destroy the immediacy of data transmission. How to improve the execution efficiency of the intra-core computation task and the inter-core communication task in the many-core chip has not provided a reasonable and effective implementation manner in the related art.
Therefore, the communication method applied to the many-core chip is provided by the embodiments of the present disclosure, and by transmitting an instant primitive request and a response mechanism, data transmission between asynchronous time sequence clusters in the many-core chip is realized, so that the overall waiting time in the many-core chip is reduced, the calculation efficiency and the execution speed are improved, and the overall operation performance of the many-core chip is improved.
First, an application scenario to which the present disclosure relates will be described. The neuromorphic chip system comprises one or more neuromorphic chips, each neuromorphic chip comprises a computing unit and communication primitives, and all the communication primitives form a communication network, namely a routing communication network; each computing unit is accessed to the routing communication network through the communication primitives, so that communication connection among the computing units is realized.
The calculation unit has different total calculation tasks in each working time period, and the routing communication network sends calculation outputs generated by the calculation unit in different time periods in the current working time period to other calculation units as processed data in the next working time period.
In the embodiment of the present disclosure, the many-core chip includes a plurality of functional cores, the many-core chip may also be referred to as the neuromorphic chip, and the functional cores may also be referred to as the computing units, that is, the functional cores perform data transmission through the routing communication network.
A plurality of functional cores in a many-core chip are divided into different time sequence clusters according to a computing task, each time sequence cluster comprises one or more functional cores, and the different time sequence clusters have asynchronous work cycles. That is, the many-core chip at least comprises two timing clusters, namely a first timing cluster and a second timing cluster, and the first timing cluster and the second timing cluster are asynchronous on a work cycle.
Local computation and inter-core communication in each functional core are also performed simultaneously in a pipelined manner during each duty cycle. When the work cycle needing asynchronous data transmission is executed, the cores from different asynchronous time sequence clusters establish the connection of data transmission according to a mechanism of sending an instant primitive request, returning an instant primitive response and transmitting data.
The following describes a communication method applied to a many-core chip provided by the embodiments of the present disclosure, with several exemplary embodiments.
Referring to fig. 1, a flowchart of a communication method applied to a many-core chip according to an exemplary embodiment of the present disclosure is shown, where the present embodiment is illustrated by applying the method to the many-core chip, and the method includes the following steps.
The many-core chip at least comprises a first timing cluster and a second timing cluster, and the first timing cluster and the second timing cluster are asynchronous on a working cycle.
Because the first timing cluster and the second timing cluster are asynchronous in the working period, the starting time or the time length or the ending time of the ith working period corresponding to the first timing cluster and the second timing cluster are different, and i is a positive integer.
For a functional core in a timing cluster (either the first timing cluster or the second timing cluster), each of the plurality of duty cycles is the same or at least two of the plurality of duty cycles are different.
Optionally, the immediate primitive request includes a storage location of the immediate primitive instruction in the functional core in the second sequential cluster. The immediate primitive instruction is pre-configured and is configured to require an immediate primitive reply to be returned.
Optionally, the functional core in the first timing cluster acquires a static primitive instruction of the first duty cycle, where the static primitive instruction is used to indicate a preconfigured inter-core communication task; and the functional core in the first time sequence cluster sends an instant primitive request to the functional core in the second time sequence cluster in a first working period according to the inter-core communication task.
The first working period is the current working period of the first time sequence cluster. For example, the first duty cycle is a first duty cycle of the first timing cluster, or the first duty cycle is a second duty cycle of the first timing cluster, or the first duty cycle is a third duty cycle of the first timing cluster. The embodiments of the present disclosure do not limit this.
Optionally, the static primitive instruction is a preconfigured instruction, and the static primitive instruction is further used for indicating a preconfigured inter-core communication task.
Optionally, the static primitive instruction of a work cycle is used to indicate a total task of computation and an inter-core communication task configured in advance for the work cycle. The total task of calculation is used for indicating the total task of calculation in the to-be-core in the working cycle, the inter-core communication task is used for indicating that data needs to be transmitted across the cores in the working cycle, and the data to be transmitted across the cores in the working cycle is the calculated content in the working cycle or the data which is stored in the core memory before the working cycle.
Optionally, after receiving the immediate primitive request in the second working cycle, the functional core in the second time-series cluster stores the immediate primitive request in the cache; and the functional core in the second time sequence cluster returns an instant primitive response corresponding to the instant primitive request after the second working period.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
Optionally, the function core in the second time-series cluster starts to execute the instant primitive instruction indicated by the instant primitive request after the second working period, and returns an instant primitive response corresponding to the instant primitive request.
Optionally, after the second working period, the functional core in the second sequential cluster reads the immediate primitive instruction from the storage location of the immediate primitive instruction included in the immediate primitive request, and returns an immediate primitive response.
Optionally, after the execution of the specified number of static primitive instructions or after the target working period after the second working period, the functional core in the second time-series cluster returns an instant primitive response corresponding to the instant primitive request. And the difference value of the number of the periods between the target working period and the second working period is a preset threshold value.
Illustratively, after the execution of the specified number of static primitive instructions or after the target working period after the second working period, the functional core in the second time-series cluster reads the immediate primitive instruction from the storage location according to the storage location of the immediate primitive instruction included in the immediate primitive request, and returns an immediate primitive response.
Wherein the specified number or the preset threshold is pre-configured. For example, if the specified number is 2, the functional core in the second sequential cluster returns an immediate primitive response after executing 2 static primitive instructions. For another example, if the preset threshold is 1, the functional core in the second timing cluster returns an immediate primitive response after the first working period after the second working period.
Optionally, the second timing cluster includes a plurality of functional cores, and after the functional core in the first timing cluster sends the immediate primitive request to the functional core in the second timing cluster, the method further includes: after receiving the immediate primitive request sent by the functional core in the first timing cluster, any one functional core in the second timing cluster sends the immediate primitive request to other functional cores in the second timing cluster in a multicast mode in the current work cycle of the second timing cluster.
Optionally, data transmission between multiple functional cores in the same timing cluster is a node propagation manner.
It should be noted that, in the second time series cluster, the functional core of the received immediate primitive request (i.e., the immediate primitive request sent by the functional core of the first time series cluster), the functional core returning the immediate primitive response, and the functional core performing data transmission with the first time series cluster may be the same or different, and the embodiment of the present disclosure does not limit this.
And step 102, after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster.
Optionally, after the functional core in the first timing cluster sends the immediate primitive request to the functional core in the second timing cluster within the first working period, if the total computation task corresponding to the first working period is completed and the immediate primitive response is not received yet, the functional core continues to wait for the immediate primitive response, performs data transmission with the functional core in the second timing cluster after receiving the immediate primitive response, and ends the working period after the data transmission is completed.
Optionally, the first timing cluster includes a plurality of functional cores, and the method further includes: and after receiving the immediate primitive response, any functional core in the first time sequence cluster sends the immediate primitive response to other functional cores in the first time sequence cluster in a multicast mode. Therefore, after receiving the immediate primitive response, the other functional cores in the first time sequence cluster can also perform data transmission with the functional cores in the second time sequence cluster.
It should be noted that, in the first time sequence cluster, the functional core that sends the immediate primitive request, the functional core that receives the immediate primitive response (i.e., the immediate primitive response sent by the functional core of the second time sequence cluster), and the functional core that performs data transmission with the second time sequence cluster may be the same or different, and the embodiment of the present disclosure does not limit this.
Optionally, after receiving the immediate primitive response, the functional core in the first timing cluster performs data transmission with the functional core in the second timing cluster, where the data transmission includes: and after receiving the immediate primitive response, the functional core in the first time sequence cluster sends the data to be sent to the functional core in the second time sequence cluster according to the static primitive instruction of the first working period, and/or receives the data sent by the functional core in the second time sequence cluster.
Illustratively, the functional core in the first timing cluster sends data to be sent to the functional core in the second timing cluster according to the static primitive instruction of the first duty cycle. Correspondingly, the functional core in the second time-sequential cluster receives the data.
Illustratively, the functional core in the second timing cluster sends the data to be sent to the functional core in the first timing cluster according to the static primitive instruction of the second duty cycle. Correspondingly, the functional core in the first time sequence cluster receives the data.
To sum up, the embodiment of the present disclosure provides a communication method applied to a many-core chip, where the many-core chip at least includes a first timing cluster and a second timing cluster, where the first timing cluster and the second timing cluster are asynchronous in a working cycle, a functional core in the first timing cluster sends an instant primitive request to a functional core in the second timing cluster, and the instant primitive request is used to indicate that the functional core in the second timing cluster returns an instant primitive response after the current working cycle of the second timing cluster; after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster; the method and the device avoid the situation that the many-core chip executes the local computing task of each core and the data transmission among the cores according to the globally synchronous time sequence in the related technology, realize the data transmission among asynchronous time sequence clusters in the many-core chip by transmitting the instant primitive request and the response mechanism, reduce the overall waiting time in the many-core chip, improve the computing efficiency and the execution speed, and improve the overall operation performance of the many-core chip.
It should be noted that, before the functional core in the first timing cluster sends the immediate primitive request to the functional core in the second timing cluster, the many-core chip needs to divide the multiple functional cores in the many-core chip into different timing clusters according to the total computation task to be executed. Wherein each sequential cluster includes one or more functional cores, and different sequential clusters have asynchronous duty cycles. Namely, before the step 101, the following steps are also included, as shown in fig. 2:
step 201, dividing a plurality of functional cores in a many-core chip according to a total task to be executed to obtain a plurality of timing clusters with asynchronous work cycles, wherein the plurality of timing clusters at least comprise a first timing cluster and a second timing cluster.
Wherein each timing cluster comprises one or more functional cores with synchronized duty cycles, the functional cores being configured to perform a portion of a computational task of computing a total task.
Optionally, according to different contents of the total calculation tasks or different execution times, the multiple functional cores in the many-core chip are divided into different time sequence clusters, each time sequence cluster includes one or more cores, and different time sequence clusters have asynchronous work cycles.
Optionally, dividing the total calculation task into a plurality of subtasks according to the data amount of the total calculation task; the method comprises the steps of dividing a plurality of cores in a many-core chip into a plurality of time sequence clusters, wherein each time sequence cluster comprises one or more functional cores, each time sequence cluster is used for executing a calculation task, and the calculation task is a part of a total calculation task to be executed.
The set of the calculation tasks executed by each of the plurality of time sequence clusters is a calculation total task, and the calculation tasks executed by any two time sequence clusters in the plurality of time sequence clusters do not have intersection.
Optionally, the amount of computation operations performed by each core in the multiple time-series clusters is the same, or an absolute value of a difference between the amounts of computation operations performed by any two cores in the multiple time-series clusters is smaller than a preset difference threshold.
For example, the total computing task is divided into four computing tasks, namely a first computing task, a second computing task, a third computing task and a fourth computing task; the method comprises the steps of dividing a plurality of cores in a many-core chip into two time sequence clusters, namely a first time sequence cluster and a second time sequence cluster, wherein the plurality of cores in the first time sequence cluster are used for executing a first calculation task and a second calculation task, the plurality of cores in the second time sequence cluster are used for executing a third calculation task and a fourth calculation task, a dividing strategy of the time sequence clusters is used for indicating that the absolute value of the difference value between a first divisor and a second divisor is smaller than a preset difference threshold, the first divisor is the divisor between the sum of the data quantities of the first calculation task and the second calculation task and the number of functional cores of the first time sequence cluster, and the second divisor is the divisor between the sum of the data quantities of the third subtask and the fourth subtask and the number of functional cores of the second time sequence cluster.
Optionally, the division result is obtained through a preset mapper output according to the total task of the computation to be executed.
The multiple functional cores belonging to the same time sequence cluster have synchronous working cycles, and the working cycles are cycles for performing local calculation and inter-core communication.
The many-core chip can circularly and continuously execute a plurality of cycles of calculating the total tasks in time, and each calculating total task corresponds to one work period. For a functional core in a timing cluster, each of the plurality of duty cycles is the same, or at least two of the plurality of duty cycles are different.
The different timing clusters are executed according to respective internal synchronization signals, and thus are asynchronous in duty cycle. That is, the starting time or the time length or the ending time of the ith working period corresponding to each of the two time sequence clusters are different, and i is a positive integer.
In an illustrative example, as shown in fig. 3, a many-core chip obtains a total computation task including 4 computation tasks (computation task 1, computation task 2, computation task 3, and computation task 4), and splits 16 cores in the many-core chip into two timing clusters, i.e., timing cluster 1 and timing cluster 2, according to the total computation task to be executed. In the time-series cluster 1, 9 cores are included for executing the calculation task 1 and the calculation task 2. In terms of time, the first work cycle corresponding to the time sequence cluster 1 executes the calculation task 1, and the second work cycle executes the calculation task 2. In the time-series cluster 2, 7 cores are included for executing the calculation task 3 and the calculation task 4. In time, the first work cycle corresponding to the time sequence cluster 2 executes the calculation task 3, and the second work cycle executes the calculation task 4. The two timing clusters are executed asynchronously.
Then data transmission between these asynchronous timing clusters is considered to establish synchronous connections and to ensure data integrity. In view of the above, the embodiment of the present disclosure provides a data transmission mechanism for an immediate instruction.
The immediate primitive execution request is pre-stored in a core needing to perform data transmission across time sequence clusters, and when the execution reaches a work cycle needing to perform asynchronous data transmission, the functional core sends a request for executing the immediate primitive instruction, namely the immediate primitive request, to the functional cores of other time sequence clusters. And after the function cores of other time sequence clusters receive the instant primitive request, the request is stored in the cache of the function cores, and after the task of the current working period is processed, whether the instant primitive request is responded or not is judged. The function core receiving the execution request of the immediate primitive can configure the immediate primitive instruction to execute after which work task is completed or to resume execution after the next work cycle. This ensures that both parties of the asynchronous data transfer perform data transfer after the computation on which the data depends is completed.
In summary, the communication method applied to the many-core chip provided by the embodiment of the present disclosure can solve the problem of data interaction between asynchronous time sequence clusters in the many-core chip, and ensure the integrity of data, so that the many-core chip does not need to execute the local computation task of each functional core and data interaction between cores according to a globally synchronous time sequence, thereby greatly reducing the overall waiting time in the many-core chip, and improving the computation efficiency and the execution speed.
In addition, the communication method provided by the embodiment of the disclosure adopts a triggering and responding mechanism of the immediate primitive request, and is still executed according to a decentralized mode, so that the inherent high execution parallelism of the many-core chip is ensured, and an additional time sequence cluster interaction controller is not required to be added. In addition, the communication method provided by the embodiment of the disclosure adopts a packet interaction method, which is equivalent to grouping data interaction in a many-core chip. Dense data interaction is realized through synchronous data interaction in the time sequence clusters, while communication with small data volume and long intervals is divided into different time sequence clusters and is realized through an asynchronous data interaction mechanism. In addition, in the communication method provided by the embodiment of the disclosure, the many-core chip is divided into time sequence clusters for executing different tasks. In application, it can be used to support the execution of a number of different tasks. In the application scene of the neuromorphic chip, the neuromorphic chip can be used for supporting a complex computing system consisting of a plurality of neural networks.
It should be noted that the asynchronous data transmission may occur between two or more asynchronous timing clusters. The two asynchronous timing clusters are two timing clusters with asynchronous duty cycles.
The functional core for data transmission between two asynchronous timing clusters may include four different cases, i.e., one-to-one, one-to-many, many-to-one, and many-to-many, and the following description will only take two asynchronous timing clusters as a first timing cluster and a second timing cluster, and use four exemplary embodiments for description. The many-core chip at least comprises a first time sequence cluster and a second time sequence cluster, wherein the first time sequence cluster and the second time sequence cluster are asynchronous in a working cycle, and both the first time sequence cluster and the second time sequence cluster comprise at least one functional core.
In one possible implementation, referring to fig. 4, the flow of one-to-one transmission between two asynchronous timing clusters is illustrated by using the method in the many-core chip described above, and the method includes, but is not limited to, the following steps.
In step 401, a first functional core sends an immediate primitive request to a second functional core in a first duty cycle.
The first functional core is a functional core in the first timing cluster, and the second functional core is a functional core in the second timing cluster.
Optionally, the first functional core obtains a static primitive instruction of a first duty cycle, where the static primitive instruction is used to indicate a preconfigured inter-core communication task, and the first duty cycle is a current duty cycle of the first timing cluster; and the functional core in the first time sequence cluster sends an instant primitive request to the functional core in the second time sequence cluster in a first working period according to the inter-core communication task.
The details of sending the instant primitive request from the first functional core to the second functional core may refer to the related description in the above embodiments, and are not described herein again.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
Optionally, the second functional core stores the immediate primitive request in an immediate primitive instruction cache of the control unit.
In step 403, the second functional core returns an immediate primitive response corresponding to the immediate primitive request after the second working period.
Optionally, after the second functional core executes the specified number of static primitive instructions or after the target working period after the second working period, the second functional core returns an instant primitive response corresponding to the instant primitive request to the first functional core.
Wherein the immediate primitive request includes a storage location of the immediate primitive instruction in the second functional core. And the second functional core reads the immediate primitive instruction from the storage position of the immediate primitive instruction in the second functional core according to the storage position of the immediate primitive instruction in the immediate primitive request, and returns an immediate primitive response. The immediate primitive response is a response corresponding to the immediate primitive request.
The details of the instant primitive response corresponding to the instant primitive request returned by the second functional core after the second working period may be similar to the description in the above embodiment, and are not repeated herein.
In step 404, the first functional core receives the immediate primitive response sent by the second functional core.
And the first functional core receives the immediate primitive response sent by the second functional core in the first work cycle.
Optionally, after the first functional core sends the immediate primitive request to the second functional core in the first working period, if the total computation task corresponding to the first working period is completed and the immediate primitive response is not received yet, the first functional core continues to wait for the immediate primitive response, and after the immediate primitive response is received, step 405 is executed.
Step 405, data transmission is performed between the first functional core and the second functional core.
Optionally, after receiving the immediate primitive response in the first working cycle, the first functional core performs data transmission with the second functional core in the first working cycle.
Optionally, the data transmission between the first functional core and the second functional core includes: and the first functional core receives the data sent by the second functional core, and/or the first functional core sends the data to be sent to the second functional core according to the static primitive instruction of the first work cycle.
In one illustrative example, the principle of one-to-one transfer between two asynchronous sequential clusters (i.e., data transfer between functional core 1 in sequential cluster 1 and functional core 2 in sequential cluster 2) is shown in FIG. 5. The function core 1 obtains the static primitive instruction 2 of the work cycle S2 after the work cycle S2 starts, and the function core 1 sends an immediate primitive request to the function core 2 in the work cycle S2 according to the inter-core communication task indicated by the static primitive instruction 2. After receiving the instant primitive request in the work cycle K2, the function core 2 stores the instant primitive request in the cache, after the work cycle K3 starts, the function core 2 executes the instant primitive instruction indicated by the instant primitive request, and returns an instant primitive response, and after receiving the instant primitive response in the work cycle S2, the function core 1 performs data transmission with the function core 2.
In another possible implementation, referring to fig. 6, the many-to-one transmission process between two asynchronous timing clusters is illustrated by using the method in the many-core chip described above, and the method includes, but is not limited to, the following steps.
Step 601, the first functional core sends an immediate primitive request to the second functional core in the first work cycle.
The first functional core is any one of a plurality of functional cores in the first time sequence cluster, and the second functional core is one of the functional cores in the second time sequence cluster. For example, the first timing cluster includes functional core 1, functional core 2, and functional core 3. The first functional core is any one of functional core 1, functional core 2, and functional core 3.
The first working period is the current working period of the first time sequence cluster.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
Step 603, after the second working period, the second functional core sends an instant primitive response corresponding to the instant primitive request to the third functional core.
The third functional core is any one of the plurality of functional cores of the first time sequence cluster.
Optionally, the third functional core is the first functional core, that is, the third functional core and the first functional core are the same functional core; alternatively, the third functional core is different from the first functional core, that is, the third functional core and the first functional core are two different functional cores in the first timing cluster.
Optionally, after the second functional core executes the specified number of static primitive instructions or after the target working period after the second working period, the second functional core sends an instant primitive response corresponding to the instant primitive request to the third functional core.
And step 604, after receiving the immediate primitive response, the third functional core sends the immediate primitive response to other functional cores in the first time sequence cluster in a multicast mode.
And the other functional cores in the first time sequence cluster are all functional cores except the third functional core in the first time sequence cluster.
Optionally, after receiving the immediate primitive response, the third functional core sends the immediate primitive response to each other functional core in the first timing cluster in a multicast manner, or sends the immediate primitive response to each other functional core in the first timing cluster in a multicast manner (similar to node propagation) in sequence.
Illustratively, for one functional core in the intermediate node in the first timing cluster, after receiving the immediate primitive response, the functional core sends the immediate primitive response to the next functional core in the first timing cluster in a multicast manner, and step 605 is executed. For a functional core at the tail node in the first timing cluster, the functional core performs step 605 after receiving the immediate primitive response.
Step 605, after receiving the immediate primitive response, the plurality of functional cores of the first timing cluster perform data transmission with the second functional core.
For at least one functional core in the plurality of functional cores of the first time sequence cluster, after receiving the immediate primitive response, the data transmission between the second functional core and the immediate primitive response can be started.
It should be noted that, the relevant details of each step in this embodiment can refer to the relevant description in the above embodiments, and are not repeated herein.
In one illustrative example, the principle of many-to-one transfer between two asynchronous sequential clusters (i.e., data transfer between functional core 1-3 in sequential cluster 1 and functional core 4 in sequential cluster 2) is shown in FIG. 7. The function core 1 obtains the static primitive instruction 2 of the work cycle S2 after the work cycle S2 starts, and the function core 1 sends an immediate primitive request to the function core 4 in the work cycle S2 according to the inter-core communication task indicated by the static primitive instruction 2. After receiving the immediate primitive request in the work cycle K2, the function core 4 stores the immediate primitive request in the cache, and after the work cycle K3 starts, the function core 4 executes the immediate primitive instruction indicated by the immediate primitive request and returns an immediate primitive response to the function core 3. After receiving the immediate primitive response in the work cycle S2, the function core 3 sends the immediate primitive response to the function core 2 in a multicast manner, and the function core 2 sends the immediate primitive response to the function core 1 in a multicast manner. Each of the functional cores 1 to 3 performs data transmission with the functional core 4 after receiving the immediate primitive response.
In another possible implementation, referring to fig. 8, the embodiment is illustrated by using the method in the many-core chip described above, and the method includes, but is not limited to, the following steps.
In step 801, a first functional core sends an immediate primitive request to a second functional core during a first duty cycle.
The first functional core is one functional core in the first time sequence cluster, and the second functional core is any one functional core in a plurality of functional cores in the second time sequence cluster. For example, the second timing cluster includes functional core 2, functional core 3, and functional core 4. The second functional core is any one of functional core 2, functional core 3, and functional core 4.
The first working period is the current working period of the first time sequence cluster.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
In step 803, the second functional core sends the immediate primitive request to other functional cores in the second time-sequential cluster in a multicast manner in the second duty cycle.
Optionally, after receiving the immediate primitive request in the second duty cycle, the second functional core stores the immediate primitive request in a cache, and simultaneously sends the immediate primitive request to other functional cores in the second time-series cluster in a multicast manner. Correspondingly, other functional cores in the second time-series cluster receive and store the immediate primitive request.
In order to ensure that the immediate primitive requests sent in the multicast mode are all received in the same working cycle, the second timing cluster needs to wait until each functional core in the second timing cluster receives the immediate primitive request and then end the current working cycle, namely the second working cycle, and the guarantee is realized through multicast response. Optionally, for each functional core in other functional cores in the second time-series cluster, a multicast response is returned after receiving the multicast immediate primitive request; and finishing the current working period of the second time sequence cluster when each functional core in the second time sequence cluster finishes the total calculation task of the current working period of the second time sequence cluster and other functional cores in the second time sequence cluster return multicast response.
In one possible implementation manner, every time one functional core in the second time sequence cluster sends an immediate primitive request to other functional cores in the second time sequence cluster in a multicast manner in the second work cycle, the assigned counter is increased by one, and the initial value of the assigned counter is zero; for each functional core in other functional cores in the second time sequence cluster, returning a multicast response after receiving the multicast immediate primitive request; decrementing the designated counter by one each time a functional core in the second timing cluster receives a multicast response; and when each functional core in the second time sequence cluster finishes the total calculation task of the second working period and the designated counter is zero, finishing the second working period. It should be noted that the embodiments of the present disclosure do not limit this.
Step 804, after the second working period, the third functional core sends an instant primitive response corresponding to the instant primitive request to the first functional core.
The third functional core is any one of the plurality of functional cores of the second time-series cluster.
Optionally, the third functional core is the second functional core, that is, the third functional core and the second functional core are the same functional core; alternatively, the third functional core is different from the second functional core, i.e., the third functional core and the second functional core are two different functional cores in the second timing cluster.
Optionally, after the third functional core executes the specified number of static primitive instructions or after the target working period after the second working period, the third functional core sends an instant primitive response corresponding to the instant primitive request to the first functional core.
Step 805, after receiving the immediate primitive response, the first functional core performs data transmission with the functional core in the second time-series cluster.
In one possible implementation, after receiving the immediate primitive response, the first functional core performs data transmission with each functional core in the second time-series cluster.
In another possible implementation manner, after receiving the immediate primitive response of the third functional core, the first functional core performs data transmission with the third functional core.
Optionally, the third functional core sends the received data to other functional cores in the second time-series cluster.
It should be noted that, the relevant details of each step in this embodiment can refer to the relevant description in the above embodiments, and are not repeated herein.
In one illustrative example, the principle of one-to-many transfer between two asynchronous sequential clusters (i.e., data transfer between functional core 1 in sequential cluster 1 and functional core 2-4 in sequential cluster 2) is shown in FIG. 9. The function core 1 obtains the static primitive instruction of the work cycle S2 after the work cycle S2 starts, and the function core 1 sends an instant primitive request to the function core 2 in the work cycle S2 according to the inter-core communication task indicated by the static primitive instruction. After the work cycle K2, the function core 2 stores the immediate primitive request in the cache, and sends the immediate primitive request to the function core 3 in a multicast manner, and increments the specified counter by one. Functional core 3 performs a similar receive store operation by sending the immediate primitive request to functional core 4 in a multicast fashion and incrementing a specified counter by one. After receiving the multicast immediate primitive request, the functional core 3 returns a multicast response to the functional core 2; the functional core 4 returns a multicast response to the functional core 3 after receiving the multicast immediate primitive request. After receiving the multicast response, the function core 2 and the function core 3 both reduce the designated counter by one; when the functional cores 2 to 4 complete the total task of the calculation of the duty cycle K2 and the designated counter is zero, the duty cycle K2 ends. The function core 3 executes the immediate primitive instruction indicated by the immediate primitive request after the work cycle K3 starts, and returns an immediate primitive response to the function core 1. After receiving the immediate primitive response in the work cycle S2, the function core 1 performs data transmission with the function cores 2 to 4.
In another possible implementation, referring to fig. 10, the many-to-many transmission process between two asynchronous timing clusters is illustrated by using the method in the many-core chip described above, and the method includes, but is not limited to, the following steps.
In step 1001, a first functional core sends an immediate primitive request to a second functional core in a first duty cycle.
The first functional core is any one of a plurality of functional cores in the first time sequence cluster, and the second functional core is any one of a plurality of functional cores in the second time sequence cluster.
The first working period is the current working period of the first time sequence cluster.
In step 1002, after receiving the immediate primitive request in the second work cycle, the second functional core stores the immediate primitive request in a cache.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
In step 1003, the second functional core sends the immediate primitive request to other functional cores in the second time-series cluster in a multicast manner in the second work cycle.
Other functional cores in the second time-sequential cluster receive and store the immediate primitive request. And other functional cores in the second time sequence cluster return multicast response after receiving the multicast immediate primitive request respectively.
In step 1004, after the second working period, the third functional core sends an instant primitive response corresponding to the instant primitive request to the fourth functional core.
The third functional core is any one of a plurality of functional cores in the second time sequence cluster, and the fourth functional core is any one of a plurality of functional cores in the first time sequence cluster. That is, the third functional core may be the second functional core, or may be one functional core other than the second functional core in the second sequential cluster. The fourth functional core may be the first functional core, or may be one functional core other than the first functional core in the second sequential cluster.
Optionally, after the third functional core executes the specified number of static primitive instructions or after the target working period after the second working period, the third functional core sends an instant primitive response corresponding to the instant primitive request to the fourth functional core.
Step 1006, after receiving the immediate primitive response, the functional core in the first timing cluster performs data transmission with the functional core in the second timing cluster.
For at least one functional core in the plurality of functional cores of the first time sequence cluster, after receiving the immediate primitive response, data transmission with at least one functional core in the second time sequence cluster can be started.
It should be noted that, the relevant details of each step in this embodiment can refer to the relevant description in the above embodiments, and are not repeated herein.
In one illustrative example, the principle of many-to-many transfer between two asynchronous sequential clusters (i.e., data transfer between functional cores 1-3 in sequential cluster 1 and functional cores 4-6 in sequential cluster 2) is shown in FIG. 11. The function core 1 obtains the static primitive instruction 2 of the work cycle S2 after the work cycle S2 starts, and the function core 1 sends an immediate primitive request to the function core 4 in the work cycle S2 according to the inter-core communication task indicated by the static primitive instruction 2. The function core 4 stores the immediate primitive request in the cache after the work cycle K2, and sends the immediate primitive request to the function core 5 in a multicast manner, and increments the specified counter by one. The function core 5 performs a similar receive store operation by sending the immediate primitive request to the function core 6 in a multicast fashion and incrementing a specified counter by one. After receiving the multicast immediate primitive request, the functional core 5 returns a multicast response to the functional core 4; the functional core 6 returns a multicast response to the functional core 5 after receiving the multicast immediate primitive request. After receiving the multicast response, both the functional core 4 and the functional core 5 reduce the designated counter by one; when the functional cores 4 to 6 complete the total task of the calculation of the duty cycle K2 and the designated counter is zero, the duty cycle K2 ends. The function core 5 executes the immediate primitive instruction indicated by the immediate primitive request after the start of the work cycle K3, and returns an immediate primitive response to the function core 3. After receiving the immediate primitive response in the work cycle S2, the function core 3 sends the immediate primitive response to the function core 2 in a multicast manner, and the function core 2 sends the immediate primitive response to the function core 1 in a multicast manner. Each of the functional cores 1 to 3 performs data transmission with the functional cores 4 to 6 after receiving the immediate primitive response.
Similarly, at the data interaction level among multiple asynchronous time sequence clusters, there are four cases of one-to-one, one-to-many, many-to-one and many-to-many. Two exemplary embodiments, one-to-many and many-to-one, are described below. One-to-one and many-to-many cases can be inferred by analogy by those skilled in the art, and are not described herein.
In a possible implementation manner, referring to fig. 12, an asynchronous data transmission process between an asynchronous timing cluster and a plurality of timing clusters (for example, an asynchronous timing cluster is a first timing cluster, and a plurality of timing clusters are a second timing cluster and a third timing cluster), the present embodiment is exemplified by using the method in the above-mentioned neuromorphic chip system, and the method includes, but is not limited to, the following steps.
Step 1201, the functional core of the first timing cluster sends an immediate primitive request to the functional core of the second timing cluster and the functional core of the third timing cluster in the first working cycle.
The functional core of the first time sequence cluster is any one functional core in the first time sequence cluster. The functional core of the second timing cluster is any one functional core in the second timing cluster. The functional core of the third timing cluster is any one functional core in the third timing cluster. The first timing cluster and the second timing cluster are two asynchronous timing clusters, and the first timing cluster and the third timing cluster are two asynchronous timing clusters. The second timing cluster and the third timing cluster may be two synchronous timing clusters or two asynchronous timing clusters.
The two asynchronous timing clusters are two timing clusters with asynchronous duty cycles. The two synchronized timing clusters are two timing clusters having synchronized duty cycles.
The first working period is the current working period of the first time sequence cluster.
It should be noted that the functional core of the first time sequence cluster may send an immediate primitive request to the functional core of the second time sequence cluster first, and then send an immediate primitive request to the functional core of the third time sequence cluster; or the functional core of the first time sequence cluster may send the immediate primitive request to the functional core of the third time sequence cluster first, and then send the immediate primitive request to the functional core of the second time sequence cluster; alternatively, the functional core of the first timing cluster may send an immediate primitive request to the functional core of the second timing cluster and the functional core of the third timing cluster at the same time. The embodiments of the present disclosure do not limit this.
Step 1202, after receiving the immediate primitive request in the second working cycle, the functional core of the second time sequence cluster stores the immediate primitive request in a cache, and performs multicast transmission and multicast response of the immediate primitive request in the second time sequence cluster.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
In step 1203, after the second working period, the functional core of the second timing cluster sends an instant primitive response corresponding to the instant primitive request to the functional core of the first timing cluster.
Optionally, after the function core of the second timing cluster executes the specified number of static primitive instructions or after the target working period after the second working period, the function core sends an immediate primitive response corresponding to the immediate primitive request to the first timing cluster.
Step 1204, after receiving the immediate primitive request in the third working cycle, the functional core of the third time sequence cluster stores the immediate primitive request in the cache, and performs multicast transmission and multicast response of the immediate primitive request in the third time sequence cluster.
And the third working period is the current working period of the third time sequence cluster, and is different from the first working period.
Step 1205, the functional core of the third time sequence cluster sends the instant primitive response corresponding to the instant primitive request to the functional core of the first time sequence cluster after the second working period.
Optionally, after the functional core of the third time series cluster executes the specified number of static primitive instructions or after the target working period after the second working period, the functional core of the third time series cluster returns an instant primitive response corresponding to the instant primitive request to the functional core of the first time series cluster.
It should be noted that, step 1202 and step 1203, that is, the process of returning the instant primitive response after the functional core of the second time-series cluster receives the instant primitive request, and step 1204 and step 1205, that is, the process of returning the instant primitive response after the functional core of the third time-series cluster receives the instant primitive request, may be executed in parallel, and this is not limited in the embodiment of the present disclosure.
In step 1206, after receiving the immediate primitive responses sent by the functional cores of the second timing cluster and the third timing cluster, the functional core of the first timing cluster performs data transmission with the functional cores of the second timing cluster and the third timing cluster.
It should be noted that, in the first time-series cluster, the functional core that sends the immediate primitive request, the functional core that receives the immediate primitive response (i.e., the immediate primitive response sent by the functional core of the second time-series cluster), and the functional core that performs data transmission with the second time-series cluster may be the same or different. The functional core of the received instant primitive request (i.e., the instant primitive request sent by the functional core of the first time sequence cluster) in the second time sequence cluster, the functional core returning the instant primitive response, and the functional core performing data transmission with the first time sequence cluster may be the same or different. The functional core of the received instant primitive request (i.e., the instant primitive request sent by the functional core of the first time sequence cluster) in the third time sequence cluster, the functional core returning the instant primitive response, and the functional core performing data transmission with the first time sequence cluster may be the same or different, which is not limited in this disclosure.
It should be noted that, the relevant details of each step in the present embodiment can be similar to the relevant description in the above embodiments, and are not repeated herein.
In one illustrative example, the principle of one-to-many asynchronous data transfer between timing cluster 1 and timing cluster 2 and 3 is shown in FIG. 13. The functional core of the time sequence cluster 1 acquires a static primitive instruction 2 of a work cycle S2 after the work cycle S2 begins, and the functional core of the time sequence cluster 1 sends an instant primitive request to the functional core of the time sequence cluster 2 and the functional core of the time sequence cluster 3 in the work cycle S2 according to the inter-core communication task indicated by the static primitive instruction 2. And after receiving the immediate primitive request in the work cycle K2, the functional core of the time sequence cluster 2 stores the immediate primitive request in a cache, and performs multicast transmission and multicast response of the immediate primitive request in the time sequence cluster 2. And the functional core of the time sequence cluster 2 sends an immediate primitive response corresponding to the immediate primitive request to the functional core of the time sequence cluster 1 in the work cycle K3. And after receiving the immediate primitive request in the work cycle R1, the functional core of the time sequence cluster 3 stores the immediate primitive request in a cache, and performs multicast transmission and multicast response of the immediate primitive request in the time sequence cluster 3. And the functional core of the time sequence cluster 3 sends an immediate primitive response corresponding to the immediate primitive request to the functional core of the time sequence cluster 1 in a work cycle R2. After the functional core of the time sequence cluster 1 receives the immediate primitive responses sent by the functional cores of the time sequence cluster 2 and the time sequence cluster 3 in the working cycle S2, the data interaction establishment in the time sequence cluster 1, the time sequence cluster 2 and the time sequence cluster 3 is completed, and data transmission is started.
In another possible implementation manner, referring to fig. 14, an asynchronous data transmission process between multiple asynchronous timing clusters and a timing cluster (for example, multiple asynchronous timing clusters are a first timing cluster and a third timing cluster, and a timing cluster is a second timing cluster) is illustrated in the present embodiment by using the method in the above-mentioned neuromorphic chip system, and the method includes, but is not limited to, the following steps.
In step 1401, a functional core of a first timing cluster sends an immediate primitive request to a functional core of a second timing cluster in a first duty cycle.
The functional core of the first time sequence cluster is any one functional core in the first time sequence cluster. The functional core of the second timing cluster is any one functional core in the second timing cluster.
The first working period is the current working period of the first time sequence cluster.
The first timing cluster and the second timing cluster are two asynchronous timing clusters, and the two asynchronous timing clusters are two timing clusters with asynchronous work periods.
Step 1402, the functional core of the third timing cluster sends an immediate primitive request to the functional core of the second timing cluster in a third working period.
And the functional core of the third time sequence cluster is any one functional core in the third time sequence cluster. The second timing cluster and the third timing cluster are two asynchronous timing clusters.
And the third working period is the current working period of the third timing sequence cluster, and is different from the second working period.
Step 1403, after the function core of the second timing cluster receives the immediate primitive request sent by the function core of the first timing cluster in the second working period, the immediate primitive request is stored in the cache, and multicast transmission and multicast response of the immediate primitive request are performed in the second timing cluster.
And the second working period is the current working period of the second time sequence cluster, and is different from the first working period.
In step 1404, after the second working period, the functional core of the second timing cluster sends the immediate primitive response corresponding to the immediate primitive request to the functional core of the first timing cluster.
Optionally, after the execution of the specified number of static primitive instructions or after the target working period after the second working period, the functional core of the second time sequence cluster sends an immediate primitive response corresponding to the immediate primitive request to the functional core of the first time sequence cluster.
Step 1405, after receiving the immediate primitive response sent by the functional core of the second time sequence cluster, the functional core of the first time sequence cluster performs data transmission with the functional core of the second time sequence cluster.
In step 1407, after the second working period, the functional core in the second time-series cluster sends the immediate primitive response corresponding to the immediate primitive request to the functional core in the third time-series cluster.
Optionally, after the functional core of the second time sequence cluster executes the specified number of static primitive instructions or after the target working period after the second working period, the functional core of the second time sequence cluster sends an instant primitive response corresponding to the instant primitive request to the functional core of the third time sequence cluster.
Optionally, after the second working period, the functional core of the second time-series cluster returns an immediate primitive response corresponding to each of the two immediate primitive requests according to the sequence of the received two immediate primitive requests.
Step 1408, after receiving the immediate primitive response sent by the functional core of the second time sequence cluster, the functional core of the third time sequence cluster performs data transmission with the functional core of the second time sequence cluster.
It should be noted that, the processes of returning the instant primitive response and transmitting data after the functional core of the second time sequence cluster receives the instant primitive request in steps 1402 to 1404, and the processes of returning the instant primitive response and transmitting data after the functional core of the third time sequence cluster receives the instant primitive request in steps 1405 and 1407 may be executed in sequence or in parallel, and the execution sequence is not limited in the embodiment of the present disclosure.
In one illustrative example, the principle of many-to-one asynchronous data transfer between timing cluster 1 and timing cluster 3 and timing cluster 2 is shown in FIG. 15. The functional core of the time sequence cluster 1 acquires a static primitive instruction 2 of a work cycle S2 after the work cycle S2 begins, and the functional core of the time sequence cluster 1 sends an instant primitive request to the functional core of the time sequence cluster 2 and the functional core of the time sequence cluster 3 in the work cycle S2 according to the inter-core communication task indicated by the static primitive instruction 2. And after receiving the immediate primitive request in the work cycle K2, the functional core of the time sequence cluster 2 stores the immediate primitive request in a cache, and performs multicast transmission and multicast response of the immediate primitive request in the time sequence cluster 2. And the functional core of the time sequence cluster 2 sends an immediate primitive response corresponding to the immediate primitive request to the functional core of the time sequence cluster 1 in the work cycle K3. And after receiving the immediate primitive response sent by the functional core of the time sequence cluster 2, the functional core of the time sequence cluster 1 performs data transmission with the functional core of the time sequence cluster 2. The functional core of the time sequence cluster 3 acquires the static primitive instruction 2 of the work cycle R2 after the work cycle R2 begins, and the functional core of the time sequence cluster 3 sends an instant primitive request to the functional core of the time sequence cluster 2 and the functional core of the time sequence cluster 3 in the work cycle R2 according to the inter-core communication task indicated by the static primitive instruction 2. And after receiving the immediate primitive request in the work cycle K2, the functional core of the time sequence cluster 2 stores the immediate primitive request in a cache, and performs multicast transmission and multicast response of the immediate primitive request in the time sequence cluster 2. And the functional core of the time sequence cluster 2 sends an immediate primitive response corresponding to the immediate primitive request to the functional core of the time sequence cluster 3 in the work cycle K4. And after receiving the immediate primitive response sent by the functional core of the time sequence cluster 2, the functional core of the time sequence cluster 3 performs data transmission with the functional core of the time sequence cluster 2.
It should be noted that many-core chips generally have good scalability, and a larger-scale chip array, i.e., the above-mentioned neuromorphic chip system, can be formed by interconnecting many-core chips. The work cycles of the functional cores belonging to different many-core chips are generally asynchronous in execution, and the communication method provided by the embodiment of the disclosure can also be used for data transmission in a neuromorphic chip system. A plurality of many-core chips in the neuromorphic chip system can run globally and synchronously, and data interaction is carried out among the respective functional cores of different many-core chips according to the communication method provided by the embodiment of the disclosure; the multiple multi-core chips in the neuromorphic chip system may also be operated asynchronously, and the communication method provided by the embodiment of the present disclosure is applied to data transmission between two or more asynchronous timing clusters across chips, for example, a first timing cluster is a timing cluster in a first chip, a second timing cluster is a timing cluster in a second chip, and the first chip and the second chip are asynchronous in a working cycle.
In summary, the embodiments of the present disclosure provide a many-core chip with divisible asynchronous timing clusters, in which data transmission between asynchronous timing clusters is implemented through a trigger-and-handshake mechanism of an instant primitive request. In addition, the transfer process of data integrity between the corresponding functional cores (1 to 1 functional core, multiple to 1 functional core, 1 to multifunctional core, and multiple to multiple functional core) of the asynchronous time sequence cluster is ensured, which comprises the following steps: sending an immediate primitive request, receiving the immediate primitive request and multicasting in a group, responding to the immediate primitive request by multicasting, responding to the immediate primitive request by responding to the immediate primitive request and executing. In addition, in a chip array system formed by a plurality of many-core chips with an immediate primitive communication mechanism, asynchronous timing clusters in different chips can transmit data through the communication method provided by the embodiment of the disclosure.
The following are embodiments of the apparatus of the embodiments of the present disclosure, and for portions of the embodiments of the apparatus not described in detail, reference may be made to technical details disclosed in the above-mentioned method embodiments.
Referring to fig. 16, a schematic diagram of a many-core chip according to an exemplary embodiment of the disclosure is shown. The many-core chip includes at least a first timing cluster 1610 and a second timing cluster 1620, where the first timing cluster 1610 and the second timing cluster 1620 are asynchronous in duty cycle. The many-core chip comprises:
a functional core in the first timing cluster 1610, configured to send an instant primitive request to a functional core in the second timing cluster 1620, where the instant primitive request is used to instruct the functional core in the second timing cluster 1620 to return an instant primitive response after a current work cycle of the second timing cluster 1620;
the functional core in the first timing cluster 1610 is further configured to perform data transmission with the functional core in the second timing cluster 1620 after receiving the immediate primitive response.
In one possible implementation form of the method,
the functional cores in the first timing cluster 1610 are further configured to obtain a static primitive instruction of a first duty cycle, where the static primitive instruction is used to indicate a pre-configured inter-core communication task, and the first duty cycle is a current duty cycle of the first timing cluster 1610;
the functional core in the first timing cluster 1610 is further configured to send an immediate primitive request to the functional core in the second timing cluster 1620 in the first working cycle according to the inter-core communication task.
In another possible implementation form of the method,
a functional core in the second timing cluster 1620, configured to store the immediate primitive request in a cache after receiving the immediate primitive request in a second working period, where the second working period is a current working period of the second timing cluster 1620, and the second working period is different from the first working period;
the functional core in the second timing cluster 1620 is further configured to return an immediate primitive response corresponding to the immediate primitive request after the second duty cycle.
In another possible implementation form of the method,
the functional core in the second timing cluster 1620 is further configured to return an immediate primitive response corresponding to the immediate primitive request after the specified number of static primitive instructions are executed or after a target working period after the second working period;
and the difference value of the number of the periods between the target working period and the second working period is a preset threshold value.
In another possible implementation form of the method,
the functional core in the first timing cluster 1610 is further configured to send data to be sent to the functional core in the second timing cluster 1620 according to the static primitive instruction of the first duty cycle after receiving the immediate primitive response, and/or receive data sent by the functional core in the second timing cluster 1620.
In another possible implementation form of the method,
the many-core chip is further configured to divide a plurality of functional cores in the many-core chip according to a total task to be executed to obtain a plurality of timing clusters with asynchronous work cycles, where the plurality of timing clusters at least include a first timing cluster 1610 and a second timing cluster 1620;
wherein each timing cluster comprises one or more functional cores with synchronized duty cycles, the functional cores being configured to perform a portion of a computational task of computing a total task.
In another possible implementation, the second timing cluster 1620 comprises a plurality of functional cores,
any functional core in the second timing cluster 1620 is further configured to, after receiving the immediate primitive request sent by the functional core in the first timing cluster 1610, send the immediate primitive request to other functional cores in the second timing cluster 1620 in a multicast manner in the current work cycle of the second timing cluster 1620.
In another possible implementation, the many-core chip is further configured to:
for each functional core in other functional cores in the second time sequence cluster, returning a multicast response after receiving the multicast immediate primitive request;
and finishing the current working period of the second time sequence cluster when each functional core in the second time sequence cluster finishes the total calculation task of the current working period of the second time sequence cluster and other functional cores in the second time sequence cluster return multicast response.
In another possible implementation, the first timing cluster 1610 includes a plurality of functional cores,
any one functional core in the first timing cluster 1610 is further configured to send, after receiving the immediate primitive response, to other functional cores in the first timing cluster 1610 in a multicast manner.
With regard to the many-core chip in the above embodiments, the specific manner in which the functional cores in the first timing cluster and the second timing cluster perform operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
The disclosed embodiments also provide a non-transitory computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the methods in the various method embodiments described above.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (11)
1. A communication method applied to a many-core chip, wherein the many-core chip comprises at least a first timing cluster and a second timing cluster, the first timing cluster and the second timing cluster being asynchronous in a duty cycle, the method comprising:
the functional core in the first time sequence cluster sends an instant primitive request to the functional core in the second time sequence cluster, wherein the instant primitive request is used for indicating the functional core in the second time sequence cluster to return an instant primitive response after the current working cycle of the second time sequence cluster;
and after receiving the immediate primitive response, the functional core in the first time sequence cluster performs data transmission with the functional core in the second time sequence cluster.
2. The method of claim 1, wherein sending an immediate primitive request from a functional core in the first timing cluster to a functional core in the second timing cluster comprises:
a functional core in the first time sequence cluster acquires a static primitive instruction of a first working period, wherein the static primitive instruction is used for indicating a pre-configured inter-core communication task, and the first working period is the current working period of the first time sequence cluster;
and the functional core in the first time sequence cluster sends the immediate primitive request to the functional core in the second time sequence cluster in the first working period according to the inter-core communication task.
3. The method of claim 2, wherein after sending an immediate primitive request to a functional core in the second timing cluster, the functional core in the first timing cluster further comprises:
after receiving the immediate primitive request in a second working cycle, a functional core in the second time sequence cluster stores the immediate primitive request in a cache, wherein the second working cycle is the current working cycle of the second time sequence cluster, and the second working cycle is different from the first working cycle;
and the functional core in the second time sequence cluster returns an instant primitive response corresponding to the instant primitive request after the second working period.
4. The method according to claim 3, wherein the returning of the immediate primitive response corresponding to the immediate primitive request by the functional core in the second time-sequential cluster after the second working period comprises:
after executing the specified number of static primitive instructions or after the target working period after the second working period, the functional core in the second time sequence cluster returns the immediate primitive response corresponding to the immediate primitive request;
and the difference value of the number of the periods between the target working period and the second working period is a preset threshold value.
5. The method of claim 2, wherein performing data transmission with the functional core in the second timing cluster after receiving the immediate primitive response comprises:
and after receiving the immediate primitive response, the functional core in the first time sequence cluster sends data to be sent to the functional core in the second time sequence cluster according to the static primitive instruction in the first working period, and/or receives data sent by the functional core in the second time sequence cluster.
6. The method according to any of claims 1 to 5, wherein before the functional core in the first timing cluster sends the immediate primitive request to the functional core in the second timing cluster, the method further comprises:
dividing a plurality of functional cores in the many-core chip according to a total calculation task to be executed to obtain a plurality of time sequence clusters with asynchronous work cycles, wherein the plurality of time sequence clusters at least comprise the first time sequence cluster and the second time sequence cluster;
wherein each of the timing clusters includes one or more functional cores having synchronized duty cycles, the functional cores being configured to perform a portion of the computational overall task.
7. The method according to any of claims 1 to 5, wherein the second sequential cluster comprises a plurality of functional cores, and after the functional core in the first sequential cluster sends the immediate primitive request to the functional core in the second sequential cluster, the method further comprises:
after receiving the immediate primitive request sent by the functional core in the first time sequence cluster, any one functional core in the second time sequence cluster sends the immediate primitive request to other functional cores in the second time sequence cluster in a multicast mode in the current work cycle of the second time sequence cluster.
8. The method of claim 7, further comprising:
for each functional core in other functional cores in the second time-series cluster, returning a multicast response after receiving the immediate primitive request of multicast;
and when each functional core in the second time sequence cluster completes the total task of the calculation of the current working period of the second time sequence cluster, and other functional cores in the second time sequence cluster all return the multicast response, ending the current working period of the second time sequence cluster.
9. The method of any of claims 1 to 5, wherein the first timing cluster comprises a plurality of functional cores, the method further comprising:
and after receiving the immediate primitive response, any functional core in the first time sequence cluster sends the immediate primitive response to other functional cores in the first time sequence cluster in a multicast mode.
10. A many-core chip, wherein the many-core chip comprises at least a first timing cluster and a second timing cluster, the first timing cluster and the second timing cluster being asynchronous in a duty cycle, the many-core chip comprising:
the function core in the first time sequence cluster is used for sending an instant primitive request to the function core in a second time sequence cluster, and the instant primitive request is used for indicating the function core in the second time sequence cluster to return an instant primitive response after the current work cycle of the second time sequence cluster;
and the functional core in the first time sequence cluster is further configured to perform data transmission with the functional core in the second time sequence cluster after receiving the immediate primitive response.
11. A non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110295620.2A CN112925739B (en) | 2021-03-19 | 2021-03-19 | Communication method applied to many-core chip, many-core chip and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110295620.2A CN112925739B (en) | 2021-03-19 | 2021-03-19 | Communication method applied to many-core chip, many-core chip and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112925739A true CN112925739A (en) | 2021-06-08 |
CN112925739B CN112925739B (en) | 2024-04-09 |
Family
ID=76175727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110295620.2A Active CN112925739B (en) | 2021-03-19 | 2021-03-19 | Communication method applied to many-core chip, many-core chip and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112925739B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114968903A (en) * | 2022-04-21 | 2022-08-30 | 清华大学 | Many external control circuit of nuclear chip |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19517539A1 (en) * | 1994-05-13 | 1995-11-16 | Mitsubishi Electric Corp | Protocol processor for a transmission network |
CN103106173A (en) * | 2013-01-25 | 2013-05-15 | 中国兵器工业集团第二一四研究所苏州研发中心 | Interconnection method among cores of multi-core processor |
CN109976925A (en) * | 2019-03-27 | 2019-07-05 | 北京翼辉信息技术有限公司 | A kind of method and system based on the mixing internuclear real time communication of multisystem |
CN112306946A (en) * | 2019-08-02 | 2021-02-02 | 滕斯托伦特股份有限公司 | Overlays for networks of processor cores |
-
2021
- 2021-03-19 CN CN202110295620.2A patent/CN112925739B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19517539A1 (en) * | 1994-05-13 | 1995-11-16 | Mitsubishi Electric Corp | Protocol processor for a transmission network |
US5619499A (en) * | 1994-05-13 | 1997-04-08 | Mitsubishi Denki Kabushiki Kaisha | Protocol processor in communication network transferring data in asynchronous transfer mode |
CN103106173A (en) * | 2013-01-25 | 2013-05-15 | 中国兵器工业集团第二一四研究所苏州研发中心 | Interconnection method among cores of multi-core processor |
CN109976925A (en) * | 2019-03-27 | 2019-07-05 | 北京翼辉信息技术有限公司 | A kind of method and system based on the mixing internuclear real time communication of multisystem |
CN112306946A (en) * | 2019-08-02 | 2021-02-02 | 滕斯托伦特股份有限公司 | Overlays for networks of processor cores |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114968903A (en) * | 2022-04-21 | 2022-08-30 | 清华大学 | Many external control circuit of nuclear chip |
CN114968903B (en) * | 2022-04-21 | 2024-04-19 | 清华大学 | External control circuit of many-core chip |
Also Published As
Publication number | Publication date |
---|---|
CN112925739B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107145380B (en) | Virtual resource arranging method and device | |
CN108647104B (en) | Request processing method, server and computer readable storage medium | |
CN110389826B (en) | Method, apparatus and computer program product for processing a computing task | |
US20120066692A1 (en) | Iteratively processing segments by concurrently transmitting to, processing by, and receiving from partnered process | |
US8447954B2 (en) | Parallel pipelined vector reduction in a data processing system | |
CN105159610A (en) | Large-scale data processing system and method | |
JP2022105146A (en) | Acceleration system, acceleration method, and computer program | |
CN112925739B (en) | Communication method applied to many-core chip, many-core chip and storage medium | |
CN111049900B (en) | Internet of things flow calculation scheduling method and device and electronic equipment | |
CN109582242B (en) | Address determination method and device for cascade memory array system and electronic equipment | |
CN112631982B (en) | Data exchange method and device based on many-core architecture | |
CN110445874A (en) | A kind of conversation processing method, device, equipment and storage medium | |
JPH11110362A (en) | Method for communicating data between computers | |
CN105487929A (en) | Method for managing shared data of lens in cluster rendering process | |
CN111245794B (en) | Data transmission method and device | |
CN116028238A (en) | Computing engine communication method and device | |
CN114201727A (en) | Data processing method, processor, artificial intelligence chip and electronic equipment | |
CN118095351B (en) | Cooperative processing device and method for layer normalization calculation | |
EP3343843B1 (en) | A control plane system and method for managing a data plane amongst a plurality of equipments | |
CN103488530A (en) | Lock migration method and device | |
CN110738017A (en) | Distributed integrated circuit simulation method and device, computing equipment and storage medium | |
CN116662037B (en) | Processing method and device for shared memory, electronic equipment and storage medium | |
CN113722053A (en) | Data access control circuit, method, electronic device, and computer-readable storage medium | |
CN118193198B (en) | Programmable primitive information processing method and device for synchronization mechanism of distributed many-core system and electronic equipment | |
Uehara | Metabolic Computing: Towards Truly Renewable Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |