CN115756837A - Dynamic scheduling method and device for crossbar matrix weight, and chip - Google Patents
Dynamic scheduling method and device for crossbar matrix weight, and chip Download PDFInfo
- Publication number
- CN115756837A CN115756837A CN202211397085.2A CN202211397085A CN115756837A CN 115756837 A CN115756837 A CN 115756837A CN 202211397085 A CN202211397085 A CN 202211397085A CN 115756837 A CN115756837 A CN 115756837A
- Authority
- CN
- China
- Prior art keywords
- queue
- resource occupation
- occupation state
- data request
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 239000011159 matrix material Substances 0.000 title claims abstract description 15
- 239000000284 extract Substances 0.000 description 4
- 229920006395 saturated elastomer Polymers 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012797 qualification Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a dynamic dispatching method, a device and a chip for cross switch matrix weight, which comprise the following steps: the cross switch is provided with a plurality of input ports and output ports, when a data request enters the input ports, the data request firstly enters a cache space, each cache space comprises a plurality of queues, and each queue corresponds to one output port of the cross switch; each input port receives a data request, analyzes a destination output port address carried in the data request, and stores data into a corresponding cache queue according to the destination output port address; an output port of the cross switch receives the resource occupation state of the transmitted queues and configures the weight of each queue; and performing polling arbitration on the data requests in each queue according to the priority relation of the queue weight according to the weight configuration table as the basis of the queue polling arbitration sequence. The invention avoids the condition that a large amount of cache resources are occupied for a long time when a port inputs a large amount of data in a burst way, and improves the utilization rate of the cache resources in the crossbar switch.
Description
Technical Field
The invention relates to the technical field of cross switches, in particular to a cross switch matrix weight dynamic scheduling method, a cross switch matrix weight dynamic scheduling device and a cross switch matrix weight dynamic scheduling chip.
Background
As the amount of on-chip computing data grows rapidly, the internal data interconnect becomes more and more a bottleneck. The crossbar switch takes on the task of interconnecting multiple ports of data within the chip. Because the data volume of each input port of the crossbar switch is large, and the crossbar switch has burst in a short time, the backlogged data can occupy cache resources for a long time, and the resource utilization efficiency is reduced, so that the improvement of the utilization rate of the cache resources in the crossbar switch is very important.
Disclosure of Invention
The object of the present invention is to solve at least one of the technical drawbacks mentioned.
Therefore, the present invention is directed to a method, an apparatus, and a chip for dynamically scheduling weights of a crossbar switch matrix to solve the problems mentioned in the background art and overcome the disadvantages in the prior art.
In order to achieve the above object, an embodiment of the present invention provides a method for dynamically scheduling weights of a crossbar switch matrix, including the following steps:
step S1, a cross switch is provided with a plurality of input ports and output ports, each input port is provided with a cache space, the cache spaces correspond to the input ports one by one, and when a data request enters the input ports, the data request firstly enters the cache spaces;
s2, each input port receives the data request, analyzes a destination output port address carried in the data request, and stores the data into a corresponding queue according to the destination output port address; counting the data request quantity of each input port, setting the resource occupation state of the current queue according to the data request quantity, and then sending the resource occupation state of the queue to the corresponding output port;
and S3, configuring the weight of each queue according to the resource occupation state by the output port of the cross switch to form a weight configuration table of the queue, taking the weight configuration table as a basis of a queue polling arbitration sequence, performing polling arbitration on the data requests in each queue according to the priority relation of the weights of the queue, and outputting the data.
Preferably, in step S1, each of the buffer spaces includes a plurality of queues, and each of the queues corresponds to one of the egress ports of the crossbar according to a destination egress port address.
Preferably, in any of the foregoing solutions, in the step S2, a counter is set at each ingress port, where the counter is used to count a data request amount in a buffer space queue at the corresponding ingress port; and then dividing a plurality of resource occupation state intervals according to the data request quantity, corresponding each resource occupation state interval to one resource occupation state, further obtaining a plurality of resource occupation states, and identifying the resource occupation state of each queue at the port inlet.
Preferably, in any of the above solutions, in the step S2, a plurality of resource occupation state intervals are divided according to the data request amount, each resource occupation state interval corresponds to one signal line, and a queue in a resource occupation state corresponding to the resource occupation state interval transmits the data request to the corresponding egress port through the corresponding signal line.
Preferably, in any of the above solutions, at least two thresholds are set comprehensively according to the data request amount and the data capacity of the queue, and a plurality of resource occupation state intervals are divided according to the thresholds, where each resource occupation state interval corresponds to one resource occupation state.
Preferably, in step S3, each queue is marked with a weight value, and the weight value is used as a basis of a queue polling arbitration sequence, where for a queue with a high weight, the queue with the highest priority is polled, and all data requests in the queue are read preferentially; for a queue with a low weight, a data request is read from the queue with the low weight, and then the queue is suspended and does not participate in the current arbitration.
The embodiment of the present invention further provides a dynamic scheduling apparatus for crossbar weights, including: the system comprises a cache space, a resource occupation state setting module, a weight configuration module and a polling arbitration module, wherein,
the crossbar switch is provided with a plurality of input ports and output ports, the cache space is arranged at each input port, and each cache space corresponds to the input port one by one, wherein when a data request enters the input port, the data request firstly enters the cache space; receiving the data request by each input port, analyzing a destination output port address carried in the data request, and storing the data into a corresponding cache queue according to the destination output port address;
the resource occupation state setting module is used for counting the data request quantity of each input port, setting the resource occupation state of the current queue according to the data request quantity, and then sending the resource occupation state of the queue to the corresponding output port;
the weight configuration module is arranged at the output port and used for configuring the weight of each queue according to the resource occupation state to form a weight configuration table of the queue;
and the polling arbitration module is used as a queue polling arbitration sequence basis according to the weight configuration table, carries out polling arbitration on the data requests in each queue according to the priority relation of the weights of the queues, and outputs data.
Preferably, in any of the above schemes, the dynamic crossbar weight scheduler further includes: the counter is used for counting the data request quantity in the corresponding cache space queue at the input port; then, the resource occupation state setting module divides a plurality of resource occupation state intervals according to the data request quantity, and corresponds each resource occupation state interval to one resource occupation state, so as to obtain a plurality of resource occupation states and identify the resource occupation state of each queue at the port inlet.
Preferably, in any of the above schemes, the resource occupation state setting module divides a plurality of resource occupation state intervals according to the data request amount, each resource occupation state interval corresponds to one signal line, and a queue in a resource occupation state corresponding to the resource occupation state interval transmits the data request to the corresponding egress port through the corresponding signal line.
Preferably, in any of the above schemes, the resource occupation state setting module performs comprehensive setting on at least two threshold values according to the data request amount and the data capacity of the queue, and divides a plurality of resource occupation state intervals according to the threshold values, where each resource occupation state interval corresponds to one resource occupation state.
Preferably, in any of the above schemes, the polling arbitration module is configured to mark each queue with a weight value, and use the weight value as a basis of a queue polling arbitration sequence, where for a queue with a high weight, the queue with the high weight is at the highest priority in polling, and all data requests in the queue are read preferentially; for a low-weighted queue, a data request is read from the low-weighted queue, and then the queue is suspended and does not participate in the current arbitration.
Another embodiment of the present invention provides a chip, where the chip includes the dynamic crossbar weight scheduling apparatus provided in the foregoing embodiment.
Compared with the prior art, the invention has the following beneficial effects: and configuring a scheduling weight according to the occupied amount of the data cache resources of each port of the crossbar, wherein the larger the occupied amount of the resources is, the larger the scheduling weight is, and the faster the data is scheduled and output. The condition that a large number of cache resources are occupied for a long time when a large number of data are input by a port in a burst mode is avoided, and the utilization rate of the cache resources in the crossbar switch is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for dynamic scheduling of weights of a crossbar according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a crossbar assembly according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a weight distribution table according to an embodiment of the present invention;
fig. 4 is a structural diagram of a dynamic crossbar weight scheduler according to an embodiment of the present invention.
Reference numerals:
1. caching space; 2. a counter; 3. a resource occupation state setting module; 4. a weight configuration module; 5. and a polling arbitration module.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The invention provides a dynamic dispatching method, a device and a chip for cross switch matrix weight. The crossbar switch comprises a plurality of input ports and a plurality of output ports, and the crossbar switch is internally composed of an entry cache and an exit scheduler. Each input port of the cross switch is provided with a memory cache module which is in one-to-one correspondence with the input ports. Each memory cache module is composed of a plurality of queues, and each queue corresponds to an output port of a cross switch. Each input port receives data, analyzes a destination output port address carried in the data, and stores the data into a corresponding cache queue according to the destination output port address.
The dynamic scheduling method of the crossbar weights according to the present invention is described in detail below with reference to fig. 1 to 3.
As shown in fig. 1, the method for dynamically scheduling weights of a crossbar switch matrix according to the embodiment of the present invention includes the following steps:
step S1, the cross switch is provided with a plurality of input ports and output ports, each input port is provided with a cache space, and the cache spaces correspond to the input ports one to one. When a data request enters an input port, the data request first enters a cache space, each cache space includes a plurality of queues, and each queue corresponds to an output port of a crossbar switch, as shown in fig. 2.
S2, each input port receives the data request, analyzes a destination output port address carried in the data request, and stores the data into a corresponding cache queue according to the destination output port address; and setting a counter for each input port, wherein the counter is used for counting the data request quantity in the queue corresponding to the input port.
Dividing a plurality of resource occupation state intervals according to the statistical data request quantity of the counter, corresponding each resource occupation state interval to one resource occupation state, further obtaining a plurality of resource occupation states, identifying the resource occupation state of each queue at an inlet port, setting the resource occupation state of the current queue, and then sending the resource occupation state of the queue to the corresponding outlet port.
Specifically, a plurality of resource occupation state intervals are divided according to the data request amount counted by the counter, each interval corresponds to one signal line, and the queue in the resource occupation state corresponding to the resource occupation state interval transmits the data request to the corresponding output port through the signal line.
In a further embodiment of the present invention, at least two threshold values are set comprehensively according to the data request amount counted by the counter and the data capacity of the queue, a plurality of resource occupation state intervals are divided according to the threshold values, and each resource occupation state interval corresponds to one resource occupation state.
It should be noted that the data request amount is the number of data requests received by the counter from the ingress port. The data capacity of a queue is the maximum amount of data that the queue can hold, which can be understood as the queue depth.
(1) Two threshold values
Setting a first threshold value and a second threshold value, dividing three intervals according to the first threshold value and the second threshold value, and further setting a resource occupation state according to the intervals.
When the data request quantity received by the queue at the input port is smaller than a first threshold value, the data request quantity in the queue is small, and the queue is marked as a state 0;
when the data request quantity received by the queue at the input port is larger than a first threshold value and smaller than a second threshold value, the data request quantity in the queue is in a medium unsaturated state and is marked as a state 1;
when the data request quantity received by the queue at the input port is larger than a second threshold value, the data request quantity in the queue is represented as a saturation state and marked as a state 2.
The crossbar switch (6 input ports and 6 output ports) is explained as an example.
The cross bar switch includes: 6 ingress ports and 6 egress ports, each ingress port has 6 queues, and then 6 queues correspond to 6 outlets respectively. And the egress port extracts the data requests from the corresponding 6 queues in the 6 inlets. The 6 queues per ingress port can be used to count, i.e. to count the number in this queue.
When the counter counts 12 data requests, setting a first threshold value to be 4 and a second threshold value to be 8 according to the data request quantity at an input port and the data capacity of a queue, and carrying out state division as follows:
when the data request quantity received by the queue at the input port is less than a first threshold value 4, the queue requests are few, and the resource occupation state interval is divided into: 0-4, marked as resource occupation state 0;
when the amount of data requests received by a queue at an ingress port is greater than a first threshold 4 and less than a second threshold 8, it indicates that the queue has a certain number of data requests but not a lot, and the resource occupation state interval is divided as follows: 4-8, marked as resource occupation state 1;
when the amount of data requests received by the queue at the ingress port is greater than a second threshold value 8, it indicates that there are many data requests of the queue, that is, the queue is in a full request state, and at this time, dividing the resource occupation state interval into: 8-12, marked as resource occupation state 2;
these 3 intervals are represented electrically by 3 signal lines, each representing which state the queue of this entry is in. Each interval corresponds to one resource occupation state, and then 3 resource occupation states are obtained.
(2) More than two threshold values
In addition, when more than two threshold values are set, a plurality of threshold intervals are divided according to requirements.
For example, when three threshold values are set, four intervals are divided as required.
When the quantity of the data requests received by the queue at the ingress port is smaller than a first threshold value, the data requests in the queue are few, and the status is marked as 0.
When the data request quantity received by the queue at the input port is larger than a first threshold value and smaller than a second threshold value, the data request quantity in the queue is in a medium unsaturated state, and the state is marked as state 1.
When the data request quantity received by the queue at the input port is larger than the second threshold value and smaller than the third threshold value, the data request quantity in the queue is in a saturated state and is marked as a state 2.
And when the data request quantity received by the queue at the input port is larger than a third threshold value, indicating that the data request quantity in the queue is in a high saturation state, and marking the state 3.
The crossbar switch (6 input ports and 6 output ports) will be described as an example.
The cross bar switch includes: 6 ingress ports and 6 egress ports, each ingress port has 6 queues, and then 6 queues correspond to 6 outlets respectively. And the egress port extracts the data request from 6 corresponding queues in the 6 inlets. The 6 queues per ingress port can be used to count, i.e. to count the number in this queue.
When the counter counts 20 data requests, setting a first threshold value to be 4, a second threshold value to be 8 and a third threshold value to be 12 according to the data request quantity at the input port and the data capacity of the queue, and carrying out state division as follows:
when the data request quantity received by the queue at the input port is less than a first threshold value 4, the queue requests are few, and the resource occupation state interval is divided into: 0-4, marked as resource occupation state 0;
when the amount of data requests received by a queue at an ingress port is greater than a first threshold 4 and less than a second threshold 8, it indicates that the queue has a certain number of data requests but not a lot, and the resource occupation state interval is divided as follows: 4-8, marked as resource occupation state 1;
when the amount of data requests received by the queue at the ingress port is greater than the second threshold value 8 and less than the third threshold value 12, it indicates that there are more data requests of the queue, that is, the queue is in a multi-request state, and at this time, dividing the resource occupation state interval is: 8-12, marked as resource occupation state 2;
when the data request quantity received by the queue at the input port is greater than a third threshold value 12, the data request quantity in the queue is in a high saturation state, and the resource occupation state interval is divided into: 12-20, marked as resource occupied state 3.
These 4 intervals are represented electrically by 4 signal lines, each representing which state the queue of this entry is in. Each interval corresponds to one resource occupation state, and 4 resource occupation states are obtained.
It should be noted that, usually, 4 resource occupation states can be satisfied.
The resource occupation states are states 0, 1, 2 and 3, the resource occupation state of each queue at the input port is identified, the resource occupation state of the current queue is set, 4 states respectively correspond to 4 signal wires, the resource occupation states are transmitted to output ports corresponding to the queues through the 4 signal wires through the signal wires, and the output ports distribute weights according to the states.
It should be noted that the threshold may be set as needed, that is, when the data request amount received by the ingress port is known, the purpose of dynamically adjusting the resource occupation state of each queue may be achieved by adjusting the threshold.
And S3, receiving the resource occupation state of the transmitted queues by the output port of the cross switch, and configuring the weight of each queue according to the resource occupation state to form a weight configuration table of the queues. The weight configuration table is used for determining the weight priority of the queue in each round of scheduling in the arbitration sequence of the output port according to the resource occupation state of the counter. And performing polling arbitration on the data requests in each queue according to the priority relation of the weights of the queues according to the weight configuration table as the basis of the queue polling arbitration sequence, and outputting data.
Specifically, configuring the weight of each queue according to the resource occupation state includes: a weight 1 is assigned to the queue of state 0, a weight 2 is assigned to the queue of state 1, a weight 3 is assigned to the queue of state 2, and a weight 4 is assigned to the queue of state 3, wherein the processing priorities of weight 1, weight 2, weight 3, and weight 4 are sequentially increased.
It should be noted that, since the 6 input ports of the crossbar switch are equivalently arranged, only one configuration table is needed.
Then, marking each queue by a weight value, and taking the weight value as the basis of a queue polling arbitration sequence, wherein for the queue with high weight, the queue with the high weight is at the highest priority in polling, and all data requests in the queue are read preferentially; for a low-weighted queue, a data request is read from the low-weighted queue, and then the queue is suspended and does not participate in the current arbitration.
Specifically, the weight configuration table is located in the output queue, 6 input ports correspond to 6 resource occupation states, and each resource occupation state is transmitted to the output port through a signal line and has 6 states. These 6 states are assigned the lowest weight, weight 1, state 3 the highest weight, weight 4, according to the weight requirement, e.g., state 0. For example, a queue in state 0 is assigned a weight of 1, a queue in state 1 is assigned a weight of 2, a queue in state 3 is assigned a weight of 3, and a queue in state 4 is assigned a weight of 4. This completes the weight configuration, which represents the subsequent polling order.
The egress port randomly selects a queue of the 6 queues for arbitration. If a certain queue has a high weight, 4 requests are continuously read from the queue with the high weight; if the weight is low, only one request will be read from the queue with low weight, and then the queue is suspended and does not participate in the current arbitration.
When the states of the 6 queues are all transmitted to the egress port, assuming that the egress port is in an initial state, all queues are in a suspended state, according to the above-described corresponding relationship (the queue in the state 0 is assigned with the weight 1, the queue in the state 1 is assigned with the weight 2, the queue in the state 3 is assigned with the weight 3, and the queue in the state 4 is assigned with the weight 4), the 6 queues are all marked with a weight value, and then according to the weight value, by using a polling arbitration rule, according to a certain sequence, for example, the queue from the entry 0 is at the highest priority in polling, the request of the queue, namely the priority, is read first.
When all queues are suspended, the weights are reassigned.
Referring to fig. 3, a typical weight configuration table is designed as follows, taking a 4 × 4 crossbar as an example:
the weight distribution table of each output port determines the weight occupied by the queue at the output port according to the state of the resource occupation count of each input port, the queue in the state 0 in the queue of the input port is distributed with the weight 1, the queue in the state 1 is distributed with the weight 2, the queue in the state 3 is distributed with the weight 3, the queue in the state 4 is distributed with the weight 4, thus the resource occupation state of each input port is associated with the scheduling weight through indexing, and the weight distribution table is updated in real time.
Before each output port is scheduled, the indication states of all the input ports are in a suspended state, at this time, the weight value of each input port at the output port is set according to a weight distribution table, and one port at the highest priority is selected according to polling arbitration. When the output port finishes one-time data transmission, the corresponding weighted value of the input port is subtracted by 1. If the weight of a certain input port is reduced to 0, the port is set to be in a suspended state, the participation scheduling qualification is lost, and the next input port is selected according to the sequence of the polling arbitration priority; meanwhile, if the weight of a certain ingress port is 1 and the port is selected at this time, but there is no request currently in the corresponding ingress port queue, the port is directly suspended and loses the qualification of participating in scheduling.
When all the queues of the input ports are hung, the state of all the ports corresponding to the scheduler is reset to be non-hung, the queue weight values corresponding to all the input ports are updated according to the weight distribution table updated in real time, the priority of the polling arbitration is updated, and a new round of scheduling is started.
For example, if the weight distribution table at time 1 of the output port a is a:1, B3, C. The weights are then updated according to the weight allocation table, and the polling priority order becomes C- > D- > a- > B. As described above, if there is no request buffer in the queue a0 (if the count status in the queue of the ingress port is not 0, this indicates that there is a request buffer in some cases), the egress port a directly updates the weight according to the weight allocation table after completing 4 transmissions with the ingress port D, and the polling priority sequence is changed to C- > D- > a- > B.
The input port of the invention can classify the incoming requests and respectively store the requests in fifo corresponding to the output ports one by one, namely an NxN structure, each input port has N fifo, the whole structure has N ^2 fifo, namely the queue in figure 3, the design can avoid the problem that one input port can only match with one output port in the traditional VOQ, but can match with a plurality of output ports simultaneously. An egress port can still only be matched with an ingress port at the same time.
As shown in fig. 4, the dynamic scheduling apparatus for crossbar weights according to the embodiment of the present invention includes: the system comprises a cache space 1, a counter 2, a resource occupation state setting module 3, a weight configuration module 4 and a polling arbitration module 5.
Specifically, the crossbar switch is provided with a plurality of input ports and output ports, the cache space 1 is arranged at each input port, and each cache space 1 corresponds to each input port. When a data request enters an input port, the data request firstly enters a cache space 1, each cache space 1 comprises a plurality of queues, and each queue corresponds to an output port of a cross switch; and each input port receives the data request, analyzes a destination output port address carried in the data request, and stores the data into a corresponding cache queue according to the destination output port address.
The counter 2 is arranged at the ingress port and used for counting the data request quantity in the queue corresponding to the ingress port.
The resource occupation state setting module 3 is configured to divide a plurality of intervals according to the data request amount, correspond each interval to one resource occupation state, further obtain a plurality of resource occupation states, identify the resource occupation state of each queue at the ingress port, set the resource occupation state of the current queue, and then send the resource occupation state of the queue to the corresponding egress port.
Specifically, the resource occupation state setting module 3 divides a plurality of resource occupation state intervals according to the data request amount, corresponds each resource occupation state interval to one resource occupation state, further obtains a plurality of resource occupation states, identifies the resource occupation state of each queue at the ingress port, sets the resource occupation state of the current queue, and then sends the resource occupation state of the queue to the corresponding egress port.
Specifically, the resource occupation state setting module 3 divides a plurality of resource occupation state intervals according to the data request amount, each interval corresponds to one signal line, and the queue in the resource occupation state corresponding to the resource occupation state interval transmits the data request to the corresponding output port through the signal line.
In a further embodiment of the present invention, the resource occupation state setting module 3 may further perform comprehensive setting on at least two threshold values according to the data request amount and the data capacity of the queue, and divide a plurality of resource occupation state intervals according to the threshold values, where each resource occupation state interval corresponds to one resource occupation state.
It should be noted that the data request amount is the number of data requests received by the counter from the ingress port. The data capacity of a queue is the maximum amount of data that the queue can hold, which can be understood as the queue depth.
(1) Two threshold values
The resource occupation state setting module 3 sets a first threshold value and a second threshold value, divides three intervals according to the first threshold value and the second threshold value, and further sets the resource occupation state according to the intervals.
When the data request amount counted by the counter is smaller than a first threshold value, the data request amount in the queue is small, and the resource occupation state setting module 3 is marked as a state 0;
when the data request amount counted by the counter is greater than a first threshold value and less than a second threshold value, the data request amount in the queue is in a medium unsaturated state, and the resource occupation state setting module 3 is marked as a state 1;
when the counter is larger than the second threshold value, it indicates that the data request amount in the queue is in a saturated state, and the resource occupation state setting module 3 is marked as state 2.
The crossbar switch (6 input ports and 6 output ports) is explained as an example.
The cross bar switch includes: each input port has 6 queues, and then 6 queues correspond to 6 outlets respectively. And the egress port extracts the data requests from the corresponding 6 queues in the 6 inlets. The 6 queues per ingress port can be used to count, i.e. to count the number in this queue.
When the counter counts 12 data requests, the resource occupation state setting module 3 sets a first threshold value to 4 and a second threshold value to 8 according to the data request amount at the input port and the data capacity of the queue, and performs state division as follows:
when the data request quantity received by the queue at the input port is less than a first threshold value 4, the queue requests are few, and the resource occupation state interval is divided into: 0-4, the resource occupation state setting module 3 is marked as a resource occupation state 0;
when the amount of data requests received by a queue at an ingress port is greater than a first threshold 4 and less than a second threshold 8, it indicates that the queue has a certain number of data requests but not a lot, and the resource occupation state interval is divided as follows: 4-8, the resource occupation state setting module 3 is marked as a resource occupation state 1;
when the amount of data requests received by the queue at the ingress port is greater than a second threshold value 8, it indicates that there are many data requests of the queue, that is, the queue is in a full request state, and at this time, dividing the resource occupation state interval into: 8-12, the resource occupation state setting module 3 is marked as a resource occupation state 2;
these 3 intervals are represented in circuit by 3 signal lines, each representing which state the queue of this entry is in. Each interval corresponds to one resource occupation state, and then 3 resource occupation states are obtained.
(2) More than two threshold values
In addition, when the resource occupation state setting module 3 sets more than two threshold values, a plurality of threshold intervals are divided according to requirements.
For example, when the resource occupation state setting module 3 sets three threshold values, four intervals are divided according to the requirement.
When the amount of data requests received by the queue at the ingress port is smaller than the first threshold, it indicates that there are few data requests in the queue, and the resource occupation state setting module 3 is marked as state 0.
When the data request amount received by the queue at the ingress port is greater than the first threshold and less than the second threshold, it indicates that the data request amount in the queue is in a medium unsaturated state, and the resource occupation state setting module 3 is marked as state 1.
When the data request amount received by the queue at the input port is greater than the second threshold value and less than the third threshold value, it indicates that the data request amount in the queue is in a saturated state, and the resource occupation state setting module 3 is marked as state 2.
When the data request amount received by the queue at the input port is greater than the third threshold value, it indicates that the data request amount in the queue is in a highly saturated state, and the resource occupation state setting module 3 marks state 3.
The crossbar switch (6 input ports and 6 output ports) will be described as an example.
The cross bar switch includes: 6 ingress ports and 6 egress ports, each ingress port has 6 queues, and then 6 queues correspond to 6 outlets respectively. And the egress port extracts the data request from 6 corresponding queues in the 6 inlets. The 6 queues per ingress port can be used for counting, i.e. for counting the number in this queue.
When the counter counts 20 data requests, the resource occupation state setting module 3 sets a first threshold value to 4, a second threshold value to 8, and a third threshold value to 12 according to the data request amount at the input port and the data capacity of the queue, and performs state division as follows:
when the data request quantity received by the queue at the input port is less than a first threshold value 4, the queue requests are few, and the resource occupation state interval is divided into: 0-4, the resource occupation state setting module 3 is marked as a resource occupation state 0;
when the amount of data requests received by a queue at an ingress port is greater than a first threshold 4 and less than a second threshold 8, it indicates that the queue has a certain number of data requests but not a lot, and the resource occupation state interval is divided as follows: 4-8, the resource occupation state setting module 3 is marked as a resource occupation state 1;
when the amount of data requests received by the queue at the ingress port is greater than the second threshold value 8 and less than the third threshold value 12, it indicates that there are more data requests of the queue, that is, the queue is in a multi-request state, and at this time, dividing the resource occupation state interval is: 8-12, the resource occupation state setting module 3 is marked as a resource occupation state 2;
when the data request quantity received by the queue at the input port is greater than a third threshold value 12, the data request quantity in the queue is in a high saturation state, and the resource occupation state interval is divided into: 12-20, the resource occupation state setting module 3 is marked as a resource occupation state 3.
These 4 intervals are represented in circuit by 4 signal lines, each representing which state the queue of this entry is in. Each interval corresponds to one resource occupation state, and 4 resource occupation states are obtained.
It should be noted that, usually, 4 resource occupation states can be satisfied.
The resource occupation states are states 0, 1, 2 and 3, the resource occupation state of each queue at the input port is identified, the resource occupation state of the current queue is set, 4 states respectively correspond to 4 signal wires, the resource occupation states are transmitted to output ports corresponding to the queues through the 4 signal wires through the signal wires, and the output ports distribute weights according to the states.
It should be noted that the threshold may be set as needed, that is, when the data request amount received by the ingress port is known, the purpose of dynamically adjusting the resource occupation state of each queue may be achieved by adjusting the threshold.
The weight configuration module 4 is arranged at the output port, receives the resource occupation state of the transmitted queue, and configures the weight of each queue according to the resource occupation state to form a weight configuration table of the queue.
Specifically, configuring the weight of each queue according to the resource occupation state includes: a weight 1 is assigned to the queue of state 0, a weight 2 is assigned to the queue of state 1, a weight 3 is assigned to the queue of state 2, and a weight 4 is assigned to the queue of state 3, wherein the processing priorities of weight 1, weight 2, weight 3, and weight 4 are sequentially increased.
The polling arbitration module 5 uses the weight configuration table as the basis of the queue polling arbitration sequence, performs polling arbitration on the data requests in each queue according to the priority relation of the weights of the queues, and outputs the data.
Specifically, the polling arbitration module 5 is configured to mark each queue with a weight value, and use the weight value as a basis of a queue polling arbitration order, where for a queue with a high weight, the queue is at the highest priority in polling, and all data requests in the queue are read preferentially; for a low-weighted queue, a data request is read from the low-weighted queue, and then the queue is suspended and does not participate in the current arbitration.
The embodiment of the invention also provides a chip, which comprises the dynamic dispatching device for the weight of the crossbar matrix provided by the embodiment of the invention.
Compared with the prior art, the invention has the following beneficial effects: and configuring scheduling weights according to the occupied amount of data cache resources of each port of the crossbar switch, wherein the larger the occupied amount of the resources is, the larger the scheduling weights are, and the faster the data is scheduled and output. The condition that a large number of cache resources are occupied for a long time when a large number of data are input by a port in a burst mode is avoided, and the utilization rate of the cache resources in the crossbar switch is improved. By counting fifo independent requests in the input port, the request congestion condition of the input port can be recorded, and by using a threshold classification mechanism of the counter and a weight distribution table of the output port, the requests of the input port with high load can be quickly processed, and the fairness can be improved by a superposition polling mechanism.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It will be understood by those skilled in the art that the present invention includes any combination of the summary and detailed description of the invention described above and those illustrated in the accompanying drawings, which is not intended to be limited to the details and which, for the sake of brevity of this description, does not describe every aspect which may be formed by such combination. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (12)
1. A dynamic dispatching method for the weight of a crossbar is characterized by comprising the following steps:
step S1, a cross switch is provided with a plurality of inlet ports and outlet ports, each inlet port is provided with a cache space, the cache spaces correspond to the inlet ports one by one, and when a data request enters the inlet ports, the data request firstly enters the cache spaces;
s2, each input port receives the data request, analyzes a destination output port address carried in the data request, and stores the data into a corresponding queue according to the destination output port address; counting the data request quantity of each input port, setting the resource occupation state of the current queue according to the data request quantity, and then sending the resource occupation state of the queue to the corresponding output port;
and S3, configuring the weight of each queue according to the resource occupation state by the output port of the cross switch to form a weight configuration table of the queue, taking the weight configuration table as a basis of a queue polling arbitration sequence, performing polling arbitration on the data requests in each queue according to the priority relation of the weights of the queue, and outputting the data.
2. The dynamic scheduling method of crossbar weight according to claim 1 wherein in step S1, each of said buffer spaces comprises a plurality of queues, and each of said queues corresponds to the egress port of a crossbar according to a destination egress port address.
3. The dynamic crossbar weighting scheduling method according to claim 1, wherein in step S2, a counter is set at each of the ingress ports, and the counter is used for counting the data request amount in the buffer space queue at the corresponding ingress port; then, a plurality of resource occupation state intervals are divided according to the data request quantity, each resource occupation state interval corresponds to one resource occupation state, a plurality of resource occupation states are obtained, and the resource occupation state of each queue at the port inlet is identified.
4. The dynamic cross-bar switch matrix weight scheduling method according to claim 3, wherein in step S2, a plurality of resource occupation state intervals are divided according to the data request amount, each resource occupation state interval corresponds to one signal line, a queue in a resource occupation state corresponding to the resource occupation state interval transmits the data request to the corresponding egress port through the corresponding signal line.
5. The dynamic cross-bar switch matrix weight scheduling method of claim 3 or 4, wherein at least two threshold values are set according to the data request amount and the data capacity of the queue, and a plurality of resource occupation state intervals are divided according to the threshold values, and each resource occupation state interval corresponds to one resource occupation state.
6. The dynamic scheduling method of claim 1, wherein in step S3, each queue is marked with a weight value, and the weight value is used as a basis for a queue polling arbitration sequence, wherein for a queue with a high weight, which is at the highest priority in polling, all data requests in the queue are read preferentially; for a queue with a low weight, a data request is read from the queue with the low weight, and then the queue is suspended and does not participate in the current arbitration.
7. A dynamic scheduling device for crossbar weights is characterized by comprising: a buffer space, a resource occupation state setting module, a weight configuration module and a polling arbitration module, wherein,
the crossbar switch is provided with a plurality of input ports and output ports, the cache space is arranged at each input port, and each cache space corresponds to the input port one by one, wherein when a data request enters the input port, the data request firstly enters the cache space; receiving the data request by each input port, analyzing a destination output port address carried in the data request, and storing the data into a corresponding cache queue according to the destination output port address;
the resource occupation state setting module is used for counting the data request quantity of each input port, setting the resource occupation state of the current queue according to the data request quantity, and then sending the resource occupation state of the queue to the corresponding output port;
the weight configuration module is arranged at the output port and used for configuring the weight of each queue according to the resource occupation state to form a weight configuration table of the queue;
and the polling arbitration module is used as a queue polling arbitration sequence basis according to the weight configuration table, carries out polling arbitration on the data requests in each queue according to the priority relation of the weights of the queues, and outputs data.
8. The dynamic crossbar weight scheduler of claim 7 further comprising: the counter is used for counting the data request quantity in the corresponding cache space queue at the input port; then, the resource occupation state setting module divides a plurality of resource occupation state intervals according to the data request quantity, each resource occupation state interval corresponds to one resource occupation state, a plurality of resource occupation states are obtained, and the resource occupation state of each queue at the port inlet is identified.
9. The dynamic crossbar weighting scheduler of claim 7, wherein the resource occupation state setting module divides a plurality of resource occupation state intervals according to the data request amount, each resource occupation state interval corresponds to one signal line, a queue in a resource occupation state corresponding to the resource occupation state interval transmits the data request to the corresponding egress port through the corresponding signal line.
10. The dynamic cross bar matrix weight scheduler of claim 8 or 9, wherein the resource occupation state setting module performs comprehensive setting of at least two thresholds according to the data request amount and the data capacity of the queue, and divides a plurality of resource occupation state intervals according to the thresholds, and each resource occupation state interval corresponds to one resource occupation state.
11. The dynamic crossbar matrix weight scheduler of claim 7 wherein the round-robin arbitration module is configured to mark each of the queues with a weight value, and use the weight value as a basis for a queue round-robin arbitration order, wherein for a queue with a high weight, which is at the highest priority in round-robin, all data requests in the queue are read preferentially; for a low-weighted queue, a data request is read from the low-weighted queue, and then the queue is suspended and does not participate in the current arbitration.
12. A chip comprising the device of any one of claims 7-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211397085.2A CN115756837A (en) | 2022-11-09 | 2022-11-09 | Dynamic scheduling method and device for crossbar matrix weight, and chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211397085.2A CN115756837A (en) | 2022-11-09 | 2022-11-09 | Dynamic scheduling method and device for crossbar matrix weight, and chip |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115756837A true CN115756837A (en) | 2023-03-07 |
Family
ID=85368494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211397085.2A Pending CN115756837A (en) | 2022-11-09 | 2022-11-09 | Dynamic scheduling method and device for crossbar matrix weight, and chip |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115756837A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118295818A (en) * | 2024-06-05 | 2024-07-05 | 北京燧原智能科技有限公司 | Multi-source arbiter polling method, multi-source arbiter, chip and polling device |
CN118409870A (en) * | 2024-07-02 | 2024-07-30 | 沐曦科技(成都)有限公司 | User arbitration system for GPU |
-
2022
- 2022-11-09 CN CN202211397085.2A patent/CN115756837A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118295818A (en) * | 2024-06-05 | 2024-07-05 | 北京燧原智能科技有限公司 | Multi-source arbiter polling method, multi-source arbiter, chip and polling device |
CN118409870A (en) * | 2024-07-02 | 2024-07-30 | 沐曦科技(成都)有限公司 | User arbitration system for GPU |
CN118409870B (en) * | 2024-07-02 | 2024-09-20 | 沐曦科技(成都)有限公司 | User arbitration system for GPU |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100933917B1 (en) | Bandwidth guarantee and overload protection method in network switch | |
CA2187291C (en) | Bus arbitration method for telecommunications switching | |
US7983273B2 (en) | High-speed scheduling apparatus for a switching node | |
US20080159145A1 (en) | Weighted bandwidth switching device | |
US7023841B2 (en) | Three-stage switch fabric with buffered crossbar devices | |
US7161906B2 (en) | Three-stage switch fabric with input device features | |
US7492782B2 (en) | Scalable crossbar matrix switching apparatus and distributed scheduling method thereof | |
CN105022717B (en) | The network-on-chip arbitration method and arbitration unit of additional request number priority | |
US20020048280A1 (en) | Method and apparatus for load balancing in network processing device | |
CN115756837A (en) | Dynamic scheduling method and device for crossbar matrix weight, and chip | |
US8964771B2 (en) | Method and apparatus for scheduling in a packet buffering network | |
CN102835081B (en) | Scheduling method, device and system based on three-level interaction and interchange network | |
JP2882384B2 (en) | Traffic shaping device | |
JP2002252629A (en) | Packet processing apparatus | |
US12137055B2 (en) | Virtual channel starvation-free arbitration for switches | |
US8018958B1 (en) | System and method for fair shared de-queue and drop arbitration in a buffer | |
CN1359241A (en) | Distribution type dispatcher for group exchanger and passive optical network | |
US11171884B2 (en) | Efficient memory utilization and egress queue fairness | |
CN101459598B (en) | Method for implementing packet exchange and system thereof | |
US7623456B1 (en) | Apparatus and method for implementing comprehensive QoS independent of the fabric system | |
WO2016132402A1 (en) | Communication frame transfer device and communication system | |
EP1335540B1 (en) | Communications system and method utilizing a device that performs per-service queuing | |
CA2319822A1 (en) | Scheduling means for data switching apparatus | |
Luijten et al. | Reducing memory size in buffered crossbars with large internal flow control latency | |
US6212181B1 (en) | Method for using the departure queue memory bandwidth to support additional cell arrivals in an ATM switch |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240318 Address after: 10 Jialeng Road, Singapore # 09-11 Applicant after: Shenglong (Singapore) Pte. Ltd. Country or region after: Singapore Address before: Floor 16, No. 9, North Fourth Ring West Road, Haidian District, Beijing, 100080 Applicant before: SUNLUNE TECHNOLOGY DEVELOPMENT (BEIJING) Co.,Ltd. Country or region before: China |