CN103106175B

CN103106175B - Based on the processor array of shared register and stream treatment

Info

Publication number: CN103106175B
Application number: CN201310027755.6A
Authority: CN
Inventors: 赵光焕; 胡志卷; 胡红旗; 刘君敏
Original assignee: Hangzhou Silan Microelectronics Co Ltd
Current assignee: Hangzhou Silan Microelectronics Co Ltd
Priority date: 2013-01-23
Filing date: 2013-01-23
Publication date: 2015-12-23
Anticipated expiration: 2033-01-23
Also published as: CN103106175A

Abstract

The invention provides a kind of processor array based on shared register and stream treatment, comprise: multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate; Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, and data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step.The present invention can make multiple processor unit collaborative work, is conducive to the counting yield improving processor array.

Description

Based on the processor array of shared register and stream treatment

Technical field

The present invention relates to multi-processor array technology, particularly relate to a kind of processor array based on shared register and stream treatment.

Background technology

Multicomputer system adopts two or more calculation processing unit, and each calculation processing unit can be communicated by bus or internet.Processor array is then the array be made up of more processor unit, controls computing that each processor unit in whole processor array is correlated with to respective data and operation with single control assembly.

At present, the demand for real-time complex calculation constantly increases, particularly the demand of video and artificial intelligence aspect, thus requires more massive computational resource.But traditional single or multiprocessor unit cannot meet so large-scale computation requirement, also cannot meet the requirement of quick real time reaction simultaneously.

Therefore, need the parallel computer hardware utilizing processor array such to promote counting yield, but, when the processor unit comprised in processor array is more, need a kind of more excellent scheme to make each processor unit collaborative work.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of processor array based on shared register and stream treatment, can make multiple processor unit collaborative work, is conducive to the counting yield improving processor array.

For solving the problems of the technologies described above, the invention provides a kind of processor array based on shared register and stream treatment, comprising:

Multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate;

Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, and data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step.

According to one embodiment of present invention, described processor array also comprises: initial register file, be connected with the register file in first pipelining-stage in described multiple pipelining-stage, for storing pending raw data, when described processor array starts, described raw data is passed the register file to described first pipelining-stage.

According to one embodiment of present invention, described processor array also comprises: result register file, be connected with the register file in last pipelining-stage in described multiple pipelining-stage, data are passed to described result register file by the register file in last pipelining-stage described.

According to one embodiment of present invention, described processor array also comprises: overall pipelining-stage control module, for controlling the conversion between the startup of described multiple pipelining-stage and each pipelining-stage.

According to one embodiment of present invention, described processor array also comprises: described multiple pipelining-stage has the identical pipelining-stage time.

According to one embodiment of present invention, the mode that the register file in adjacent pipelining-stage is copied by register space transmits data.

According to one embodiment of present invention, described processor unit comprises MIPS core, ARM core or DSP core.

Compared with prior art, the present invention has the following advantages:

The processor array of the embodiment of the present invention adopts the mode of multiple pipelining-stage to organize each processor unit, processor unit in same pipelining-stage carries out data interaction by the register file of this pipelining-stage, processor unit in different pipelining-stage does not directly communicate each other, carried out the transmission of data by register file between adjacent pipelining-stage, make whole processor array can collaborative work well, be conducive to improving calculation process efficiency.

Accompanying drawing explanation

Fig. 1 is the structural representation of embodiment of the present invention processor array.

Embodiment

Below in conjunction with specific embodiments and the drawings, the invention will be further described, but should not limit the scope of the invention with this.

With reference to figure 1, the processor array of the present embodiment comprises multiple processor unit PU, processor unit PU can be in general sense by programmed logic unit, various arithmetic, logic can be completed, be shifted, take advantage of computings such as adding, can be such as MIPS core, ARM core or DSP core, but be not limited to this.

Each processor unit PU is divided into multiple pipelining-stage (Stage), is respectively pipelining-stage 1, pipelining-stage 2 ... pipelining-stage N, wherein N is positive integer.The processor unit PU of different pipelining-stage is separate, directly can not carry out data interaction, in other words, not have direct communication connection between the processor unit PU of different pipelining-stage.

Adopt the scheme of multiple pipelining-stage effectively can reduce computer processing time, the evaluation work needing multiple cycle to complete effectively can be split as multiple pipelining-stages in cycle short period and process, each pipelining-stage walks abreast simultaneously, works alone.

Comprise a register file in each pipelining-stage, such as, in pipelining-stage 1, be provided with pipelining-stage 1 register file, in pipelining-stage 2, be provided with pipelining-stage 2 register file ..., in pipelining-stage N, be provided with pipelining-stage N register file.Realized the transmission of data by register file between each pipelining-stage.Arrange before the execution time startup of each pipelining-stage and determine, and as a preferred embodiment, all pipelining-stages have the identical pipelining-stage time.The unified pipelining-stage time can ensure that pipelining-stage there will not be spillover.

Processor unit PU in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, carries out in the register file of all deposit data in same pipelining-stage all in this pipelining-stage, comprises depositing of intermediate variable and result.Such as, each processor unit PU in pipelining-stage 1 is connected with pipelining-stage 1 register file, each processor unit PU in pipelining-stage 1 by needing mutual data stored in pipelining-stage 1 register file, shares for other processor units PU in pipelining-stage 1.

Data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step, after current pipelining-stage completes data processing, pass the content in the register file of this pipelining-stage to the register file in next pipelining-stage.Such as, each processor unit PU in pipelining-stage 1 completes respective operation, and when arriving the pipelining-stage time, the content of pipelining-stage 1 register file in pipelining-stage 1 will be passed to pipelining-stage 2 register file in pipelining-stage 2.As a nonrestrictive example, the mode that in adjacent pipelining-stage, register file adopts register space to copy transmits data.

In addition, the processor array of the present embodiment also comprises initial register file, be connected with the register file in first pipelining-stage, specifically be connected with pipelining-stage 1 register file in the present embodiment, for storing pending raw data, when processor array starts, the raw data stored is passed in pipelining-stage 1 register file.Afterwards, pipelining-stage 1 is triggered by startup steering order, and wherein each processor unit PU starts to carry out arithmetic operation according to default instruction to raw data.

The processor array of the present embodiment also comprises result register file, be connected with the register file in last pipelining-stage, specifically be connected with pipelining-stage N register file in the present embodiment, operation result is passed in result register file by pipelining-stage N register file.In addition, result register file can also notify that external devices takes operation result away.

The processor array of the present embodiment also comprises overall pipelining-stage control module (not shown in figure 1), for controlling the conversion between the startup of each pipelining-stage and each pipelining-stage.When processor array starts, after overall pipelining-stage control module receives the enabled instruction of upper level pipelining-stage, to control and the whole processor unit PU started in next stage pipelining-stage start working simultaneously.Can also comprise counting clock in overall situation pipelining-stage control module, after counting down to default periodicity (i.e. pipelining-stage time), data are passed to the register file in next pipelining-stage by the register file triggered in each pipelining-stage.

Still with reference to figure 1, below the course of work of the processor array of the present embodiment is described in detail.

First, it will be appreciated by those skilled in the art that the memory unit for loading instruction and data can also be wrapped up in the processor array periphery shown in Fig. 1.In order to ensure that processor array can high speed processing repetitive operation data, usually the instruction needed for each processor unit PU can be loaded in each processor unit PU in advance.

Prepare raw data to be processed needed for processor array, be loaded in initial register file, after completing loading, get final product each pipelining-stage of start treatment device array.First pipelining-stage 1 is started, raw data in initial register file is passed in pipelining-stage 1 register, whole processor unit PU in pipelining-stage 1 start working, corresponding calculation process is carried out to the raw data in pipelining-stage 1 register file, if each processor unit PU in processing procedure in pipelining-stage 1 needs to carry out data interaction, then can by respective data transfer in pipelining-stage 1 register.When pipelining-stage 1 carries out relevant treatment, initial register file can also load new pending file.

After pipelining-stage 1 has processed, will start next pipelining-stage, i.e. pipelining-stage 2, this process can be carried out the scheduling controlling of the overall situation by overall pipelining-stage control module.Now, all or part of content in pipelining-stage 1 register file is passed in pipelining-stage 2 register file, continues follow-up process, ensures the work of whole system flowing water, raises the efficiency.

After in the end a pipelining-stage (i.e. pipelining-stage N) completes process operation, final result data is passed in result register file.Now, peripheral reading and writing data parts just can read relevant result data from result register file.

So, then the high speed of whole processor array, parallel, flowing water work is achieved.

It should be noted that, in order to utilize processor array as far as possible efficiently, when the time needed for not considering data loading and store, needing the instruction number that each pipelining-stage of reasonable arrangement performs, to improve overall work efficiency as far as possible.

To sum up, the processor array tool of the present embodiment has the following advantages:

Share register file by same pipelining-stage, the multiple processor units in same pipelining-stage can be made in same register file to realize data sharing, simultaneously each processor unit can walk abreast again computings different separately.

By the parallel running of multiple pipelining-stage, effectively can reduce computer processing time, the evaluation work completed needing the multicycle, the multiple pipelining-stages effectively splitting into cycle short period process.Each pipelining-stage works alone simultaneously, and the method that can be copied by register space between pipelining-stage transmits calculating data.

Based on above two kinds of means, the processor array of the present embodiment is defined by the multiple processor unit in same pipelining-stage and multiple pipelining-stage, processor units all in this processor array can be put at one time and work alone, and effectively improves bulk treatment ability.

Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can make possible variation and amendment, the scope that therefore protection scope of the present invention should define with the claims in the present invention is as the criterion.

Claims

1., based on a processor array for shared register and stream treatment, it is characterized in that, comprising:

Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step, after current pipelining-stage completes data processing, the content in the register file of this pipelining-stage is passed to the register file in next pipelining-stage.

2. processor array according to claim 1, is characterized in that, also comprises:

Initial register file, be connected with the register file in first pipelining-stage in described multiple pipelining-stage, for storing pending raw data, when described processor array starts, described raw data is passed the register file to described first pipelining-stage.

3. processor array according to claim 1, is characterized in that, also comprises:

Result register file, is connected with the register file in last pipelining-stage in described multiple pipelining-stage, and data are passed to described result register file by the register file in last pipelining-stage described.

4. processor array according to any one of claim 1 to 3, is characterized in that, also comprises:

Overall situation pipelining-stage control module, for controlling the conversion between the startup of described multiple pipelining-stage and each pipelining-stage.

5. processor array according to any one of claim 1 to 3, is characterized in that, described multiple pipelining-stage has the identical pipelining-stage time.

6. processor array according to any one of claim 1 to 3, is characterized in that, the mode that the register file in adjacent pipelining-stage is copied by register space transmits data.

7. processor array according to any one of claim 1 to 3, is characterized in that, described processor unit comprises MIPS core, ARM core or DSP core.