CN103106175B - Based on the processor array of shared register and stream treatment - Google Patents
Based on the processor array of shared register and stream treatment Download PDFInfo
- Publication number
- CN103106175B CN103106175B CN201310027755.6A CN201310027755A CN103106175B CN 103106175 B CN103106175 B CN 103106175B CN 201310027755 A CN201310027755 A CN 201310027755A CN 103106175 B CN103106175 B CN 103106175B
- Authority
- CN
- China
- Prior art keywords
- pipelining
- stage
- register file
- processor
- processor array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Multi Processors (AREA)
- Advance Control (AREA)
Abstract
The invention provides a kind of processor array based on shared register and stream treatment, comprise: multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate; Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, and data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step.The present invention can make multiple processor unit collaborative work, is conducive to the counting yield improving processor array.
Description
Technical field
The present invention relates to multi-processor array technology, particularly relate to a kind of processor array based on shared register and stream treatment.
Background technology
Multicomputer system adopts two or more calculation processing unit, and each calculation processing unit can be communicated by bus or internet.Processor array is then the array be made up of more processor unit, controls computing that each processor unit in whole processor array is correlated with to respective data and operation with single control assembly.
At present, the demand for real-time complex calculation constantly increases, particularly the demand of video and artificial intelligence aspect, thus requires more massive computational resource.But traditional single or multiprocessor unit cannot meet so large-scale computation requirement, also cannot meet the requirement of quick real time reaction simultaneously.
Therefore, need the parallel computer hardware utilizing processor array such to promote counting yield, but, when the processor unit comprised in processor array is more, need a kind of more excellent scheme to make each processor unit collaborative work.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of processor array based on shared register and stream treatment, can make multiple processor unit collaborative work, is conducive to the counting yield improving processor array.
For solving the problems of the technologies described above, the invention provides a kind of processor array based on shared register and stream treatment, comprising:
Multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate;
Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, and data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step.
According to one embodiment of present invention, described processor array also comprises: initial register file, be connected with the register file in first pipelining-stage in described multiple pipelining-stage, for storing pending raw data, when described processor array starts, described raw data is passed the register file to described first pipelining-stage.
According to one embodiment of present invention, described processor array also comprises: result register file, be connected with the register file in last pipelining-stage in described multiple pipelining-stage, data are passed to described result register file by the register file in last pipelining-stage described.
According to one embodiment of present invention, described processor array also comprises: overall pipelining-stage control module, for controlling the conversion between the startup of described multiple pipelining-stage and each pipelining-stage.
According to one embodiment of present invention, described processor array also comprises: described multiple pipelining-stage has the identical pipelining-stage time.
According to one embodiment of present invention, the mode that the register file in adjacent pipelining-stage is copied by register space transmits data.
According to one embodiment of present invention, described processor unit comprises MIPS core, ARM core or DSP core.
Compared with prior art, the present invention has the following advantages:
The processor array of the embodiment of the present invention adopts the mode of multiple pipelining-stage to organize each processor unit, processor unit in same pipelining-stage carries out data interaction by the register file of this pipelining-stage, processor unit in different pipelining-stage does not directly communicate each other, carried out the transmission of data by register file between adjacent pipelining-stage, make whole processor array can collaborative work well, be conducive to improving calculation process efficiency.
Accompanying drawing explanation
Fig. 1 is the structural representation of embodiment of the present invention processor array.
Embodiment
Below in conjunction with specific embodiments and the drawings, the invention will be further described, but should not limit the scope of the invention with this.
With reference to figure 1, the processor array of the present embodiment comprises multiple processor unit PU, processor unit PU can be in general sense by programmed logic unit, various arithmetic, logic can be completed, be shifted, take advantage of computings such as adding, can be such as MIPS core, ARM core or DSP core, but be not limited to this.
Each processor unit PU is divided into multiple pipelining-stage (Stage), is respectively pipelining-stage 1, pipelining-stage 2 ... pipelining-stage N, wherein N is positive integer.The processor unit PU of different pipelining-stage is separate, directly can not carry out data interaction, in other words, not have direct communication connection between the processor unit PU of different pipelining-stage.
Adopt the scheme of multiple pipelining-stage effectively can reduce computer processing time, the evaluation work needing multiple cycle to complete effectively can be split as multiple pipelining-stages in cycle short period and process, each pipelining-stage walks abreast simultaneously, works alone.
Comprise a register file in each pipelining-stage, such as, in pipelining-stage 1, be provided with pipelining-stage 1 register file, in pipelining-stage 2, be provided with pipelining-stage 2 register file ..., in pipelining-stage N, be provided with pipelining-stage N register file.Realized the transmission of data by register file between each pipelining-stage.Arrange before the execution time startup of each pipelining-stage and determine, and as a preferred embodiment, all pipelining-stages have the identical pipelining-stage time.The unified pipelining-stage time can ensure that pipelining-stage there will not be spillover.
Processor unit PU in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, carries out in the register file of all deposit data in same pipelining-stage all in this pipelining-stage, comprises depositing of intermediate variable and result.Such as, each processor unit PU in pipelining-stage 1 is connected with pipelining-stage 1 register file, each processor unit PU in pipelining-stage 1 by needing mutual data stored in pipelining-stage 1 register file, shares for other processor units PU in pipelining-stage 1.
Data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step, after current pipelining-stage completes data processing, pass the content in the register file of this pipelining-stage to the register file in next pipelining-stage.Such as, each processor unit PU in pipelining-stage 1 completes respective operation, and when arriving the pipelining-stage time, the content of pipelining-stage 1 register file in pipelining-stage 1 will be passed to pipelining-stage 2 register file in pipelining-stage 2.As a nonrestrictive example, the mode that in adjacent pipelining-stage, register file adopts register space to copy transmits data.
In addition, the processor array of the present embodiment also comprises initial register file, be connected with the register file in first pipelining-stage, specifically be connected with pipelining-stage 1 register file in the present embodiment, for storing pending raw data, when processor array starts, the raw data stored is passed in pipelining-stage 1 register file.Afterwards, pipelining-stage 1 is triggered by startup steering order, and wherein each processor unit PU starts to carry out arithmetic operation according to default instruction to raw data.
The processor array of the present embodiment also comprises result register file, be connected with the register file in last pipelining-stage, specifically be connected with pipelining-stage N register file in the present embodiment, operation result is passed in result register file by pipelining-stage N register file.In addition, result register file can also notify that external devices takes operation result away.
The processor array of the present embodiment also comprises overall pipelining-stage control module (not shown in figure 1), for controlling the conversion between the startup of each pipelining-stage and each pipelining-stage.When processor array starts, after overall pipelining-stage control module receives the enabled instruction of upper level pipelining-stage, to control and the whole processor unit PU started in next stage pipelining-stage start working simultaneously.Can also comprise counting clock in overall situation pipelining-stage control module, after counting down to default periodicity (i.e. pipelining-stage time), data are passed to the register file in next pipelining-stage by the register file triggered in each pipelining-stage.
Still with reference to figure 1, below the course of work of the processor array of the present embodiment is described in detail.
First, it will be appreciated by those skilled in the art that the memory unit for loading instruction and data can also be wrapped up in the processor array periphery shown in Fig. 1.In order to ensure that processor array can high speed processing repetitive operation data, usually the instruction needed for each processor unit PU can be loaded in each processor unit PU in advance.
Prepare raw data to be processed needed for processor array, be loaded in initial register file, after completing loading, get final product each pipelining-stage of start treatment device array.First pipelining-stage 1 is started, raw data in initial register file is passed in pipelining-stage 1 register, whole processor unit PU in pipelining-stage 1 start working, corresponding calculation process is carried out to the raw data in pipelining-stage 1 register file, if each processor unit PU in processing procedure in pipelining-stage 1 needs to carry out data interaction, then can by respective data transfer in pipelining-stage 1 register.When pipelining-stage 1 carries out relevant treatment, initial register file can also load new pending file.
After pipelining-stage 1 has processed, will start next pipelining-stage, i.e. pipelining-stage 2, this process can be carried out the scheduling controlling of the overall situation by overall pipelining-stage control module.Now, all or part of content in pipelining-stage 1 register file is passed in pipelining-stage 2 register file, continues follow-up process, ensures the work of whole system flowing water, raises the efficiency.
After in the end a pipelining-stage (i.e. pipelining-stage N) completes process operation, final result data is passed in result register file.Now, peripheral reading and writing data parts just can read relevant result data from result register file.
So, then the high speed of whole processor array, parallel, flowing water work is achieved.
It should be noted that, in order to utilize processor array as far as possible efficiently, when the time needed for not considering data loading and store, needing the instruction number that each pipelining-stage of reasonable arrangement performs, to improve overall work efficiency as far as possible.
To sum up, the processor array tool of the present embodiment has the following advantages:
Share register file by same pipelining-stage, the multiple processor units in same pipelining-stage can be made in same register file to realize data sharing, simultaneously each processor unit can walk abreast again computings different separately.
By the parallel running of multiple pipelining-stage, effectively can reduce computer processing time, the evaluation work completed needing the multicycle, the multiple pipelining-stages effectively splitting into cycle short period process.Each pipelining-stage works alone simultaneously, and the method that can be copied by register space between pipelining-stage transmits calculating data.
Based on above two kinds of means, the processor array of the present embodiment is defined by the multiple processor unit in same pipelining-stage and multiple pipelining-stage, processor units all in this processor array can be put at one time and work alone, and effectively improves bulk treatment ability.
Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can make possible variation and amendment, the scope that therefore protection scope of the present invention should define with the claims in the present invention is as the criterion.
Claims (7)
1., based on a processor array for shared register and stream treatment, it is characterized in that, comprising:
Multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate;
Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step, after current pipelining-stage completes data processing, the content in the register file of this pipelining-stage is passed to the register file in next pipelining-stage.
2. processor array according to claim 1, is characterized in that, also comprises:
Initial register file, be connected with the register file in first pipelining-stage in described multiple pipelining-stage, for storing pending raw data, when described processor array starts, described raw data is passed the register file to described first pipelining-stage.
3. processor array according to claim 1, is characterized in that, also comprises:
Result register file, is connected with the register file in last pipelining-stage in described multiple pipelining-stage, and data are passed to described result register file by the register file in last pipelining-stage described.
4. processor array according to any one of claim 1 to 3, is characterized in that, also comprises:
Overall situation pipelining-stage control module, for controlling the conversion between the startup of described multiple pipelining-stage and each pipelining-stage.
5. processor array according to any one of claim 1 to 3, is characterized in that, described multiple pipelining-stage has the identical pipelining-stage time.
6. processor array according to any one of claim 1 to 3, is characterized in that, the mode that the register file in adjacent pipelining-stage is copied by register space transmits data.
7. processor array according to any one of claim 1 to 3, is characterized in that, described processor unit comprises MIPS core, ARM core or DSP core.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310027755.6A CN103106175B (en) | 2013-01-23 | 2013-01-23 | Based on the processor array of shared register and stream treatment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310027755.6A CN103106175B (en) | 2013-01-23 | 2013-01-23 | Based on the processor array of shared register and stream treatment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103106175A CN103106175A (en) | 2013-05-15 |
CN103106175B true CN103106175B (en) | 2015-12-23 |
Family
ID=48314045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310027755.6A Active CN103106175B (en) | 2013-01-23 | 2013-01-23 | Based on the processor array of shared register and stream treatment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103106175B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107678781B (en) * | 2016-08-01 | 2021-02-26 | 北京百度网讯科技有限公司 | Processor and method for executing instructions on processor |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1072788A (en) * | 1991-11-27 | 1993-06-02 | 国际商业机器公司 | The computer system of dynamic multi-mode parallel processor array architecture |
CN101021830A (en) * | 2007-03-29 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Method for multi-nuclear expansion in flow processor |
CN101526897A (en) * | 2009-01-22 | 2009-09-09 | 杭州中天微系统有限公司 | High speed associate processor interface of embedded processor |
CN101944013A (en) * | 2008-12-31 | 2011-01-12 | 英特尔公司 | Processor extensions for execution of secure embedded containers |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6948051B2 (en) * | 2001-05-15 | 2005-09-20 | International Business Machines Corporation | Method and apparatus for reducing logic activity in a microprocessor using reduced bit width slices that are enabled or disabled depending on operation width |
US7007128B2 (en) * | 2004-01-07 | 2006-02-28 | International Business Machines Corporation | Multiprocessor data processing system having a data routing mechanism regulated through control communication |
-
2013
- 2013-01-23 CN CN201310027755.6A patent/CN103106175B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1072788A (en) * | 1991-11-27 | 1993-06-02 | 国际商业机器公司 | The computer system of dynamic multi-mode parallel processor array architecture |
CN101021830A (en) * | 2007-03-29 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Method for multi-nuclear expansion in flow processor |
CN101944013A (en) * | 2008-12-31 | 2011-01-12 | 英特尔公司 | Processor extensions for execution of secure embedded containers |
CN101526897A (en) * | 2009-01-22 | 2009-09-09 | 杭州中天微系统有限公司 | High speed associate processor interface of embedded processor |
Also Published As
Publication number | Publication date |
---|---|
CN103106175A (en) | 2013-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101833441B (en) | Parallel vector processing engine structure | |
CN105487838A (en) | Task-level parallel scheduling method and system for dynamically reconfigurable processor | |
CN103377157B (en) | A kind of double-core data communications method for built-in digital control system | |
WO2022161318A1 (en) | Data processing device and method, and related products | |
CN102122275A (en) | Configurable processor | |
CN105183698A (en) | Control processing system and method based on multi-kernel DSP | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
KR101639853B1 (en) | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines | |
CN102087609A (en) | Dynamic binary translation method under multi-processor platform | |
CN112199173A (en) | Data processing method for dual-core CPU real-time operating system | |
CN104657111A (en) | Parallel computing method and device | |
CN103984677A (en) | Embedded reconfigurable system based on large-scale coarseness and processing method thereof | |
CN106648758A (en) | Multi-core processor BOOT starting system and method | |
CN110991619A (en) | Neural network processor, chip and electronic equipment | |
CN102360281B (en) | Multifunctional fixed-point media access control (MAC) operation device for microprocessor | |
CN101751373A (en) | Configurable multi-core/many core system based on single instruction set microprocessor computing unit | |
CN106293736B (en) | Two-stage programmer and its calculation method for coarseness multicore computing system | |
CN103106175B (en) | Based on the processor array of shared register and stream treatment | |
CN103761213A (en) | On-chip array system based on circulating pipeline computation | |
Stepchenkov et al. | Recurrent data-flow architecture: features and realization problems | |
CN102023846B (en) | Shared front-end assembly line structure based on monolithic multiprocessor system | |
CN111381882B (en) | Data processing device and related product | |
CN111078286B (en) | Data communication method, computing system and storage medium | |
CN117222991A (en) | Network-on-chip processing system | |
CN102393814B (en) | A kind of system being generated dynamic reconfigurable processor configuration information by software mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |