[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN103106175B - Based on the processor array of shared register and stream treatment - Google Patents

Based on the processor array of shared register and stream treatment Download PDF

Info

Publication number
CN103106175B
CN103106175B CN201310027755.6A CN201310027755A CN103106175B CN 103106175 B CN103106175 B CN 103106175B CN 201310027755 A CN201310027755 A CN 201310027755A CN 103106175 B CN103106175 B CN 103106175B
Authority
CN
China
Prior art keywords
pipelining
stage
register file
processor
processor array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310027755.6A
Other languages
Chinese (zh)
Other versions
CN103106175A (en
Inventor
赵光焕
胡志卷
胡红旗
刘君敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Silan Microelectronics Co Ltd
Original Assignee
Hangzhou Silan Microelectronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Silan Microelectronics Co Ltd filed Critical Hangzhou Silan Microelectronics Co Ltd
Priority to CN201310027755.6A priority Critical patent/CN103106175B/en
Publication of CN103106175A publication Critical patent/CN103106175A/en
Application granted granted Critical
Publication of CN103106175B publication Critical patent/CN103106175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a kind of processor array based on shared register and stream treatment, comprise: multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate; Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, and data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step.The present invention can make multiple processor unit collaborative work, is conducive to the counting yield improving processor array.

Description

Based on the processor array of shared register and stream treatment
Technical field
The present invention relates to multi-processor array technology, particularly relate to a kind of processor array based on shared register and stream treatment.
Background technology
Multicomputer system adopts two or more calculation processing unit, and each calculation processing unit can be communicated by bus or internet.Processor array is then the array be made up of more processor unit, controls computing that each processor unit in whole processor array is correlated with to respective data and operation with single control assembly.
At present, the demand for real-time complex calculation constantly increases, particularly the demand of video and artificial intelligence aspect, thus requires more massive computational resource.But traditional single or multiprocessor unit cannot meet so large-scale computation requirement, also cannot meet the requirement of quick real time reaction simultaneously.
Therefore, need the parallel computer hardware utilizing processor array such to promote counting yield, but, when the processor unit comprised in processor array is more, need a kind of more excellent scheme to make each processor unit collaborative work.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of processor array based on shared register and stream treatment, can make multiple processor unit collaborative work, is conducive to the counting yield improving processor array.
For solving the problems of the technologies described above, the invention provides a kind of processor array based on shared register and stream treatment, comprising:
Multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate;
Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, and data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step.
According to one embodiment of present invention, described processor array also comprises: initial register file, be connected with the register file in first pipelining-stage in described multiple pipelining-stage, for storing pending raw data, when described processor array starts, described raw data is passed the register file to described first pipelining-stage.
According to one embodiment of present invention, described processor array also comprises: result register file, be connected with the register file in last pipelining-stage in described multiple pipelining-stage, data are passed to described result register file by the register file in last pipelining-stage described.
According to one embodiment of present invention, described processor array also comprises: overall pipelining-stage control module, for controlling the conversion between the startup of described multiple pipelining-stage and each pipelining-stage.
According to one embodiment of present invention, described processor array also comprises: described multiple pipelining-stage has the identical pipelining-stage time.
According to one embodiment of present invention, the mode that the register file in adjacent pipelining-stage is copied by register space transmits data.
According to one embodiment of present invention, described processor unit comprises MIPS core, ARM core or DSP core.
Compared with prior art, the present invention has the following advantages:
The processor array of the embodiment of the present invention adopts the mode of multiple pipelining-stage to organize each processor unit, processor unit in same pipelining-stage carries out data interaction by the register file of this pipelining-stage, processor unit in different pipelining-stage does not directly communicate each other, carried out the transmission of data by register file between adjacent pipelining-stage, make whole processor array can collaborative work well, be conducive to improving calculation process efficiency.
Accompanying drawing explanation
Fig. 1 is the structural representation of embodiment of the present invention processor array.
Embodiment
Below in conjunction with specific embodiments and the drawings, the invention will be further described, but should not limit the scope of the invention with this.
With reference to figure 1, the processor array of the present embodiment comprises multiple processor unit PU, processor unit PU can be in general sense by programmed logic unit, various arithmetic, logic can be completed, be shifted, take advantage of computings such as adding, can be such as MIPS core, ARM core or DSP core, but be not limited to this.
Each processor unit PU is divided into multiple pipelining-stage (Stage), is respectively pipelining-stage 1, pipelining-stage 2 ... pipelining-stage N, wherein N is positive integer.The processor unit PU of different pipelining-stage is separate, directly can not carry out data interaction, in other words, not have direct communication connection between the processor unit PU of different pipelining-stage.
Adopt the scheme of multiple pipelining-stage effectively can reduce computer processing time, the evaluation work needing multiple cycle to complete effectively can be split as multiple pipelining-stages in cycle short period and process, each pipelining-stage walks abreast simultaneously, works alone.
Comprise a register file in each pipelining-stage, such as, in pipelining-stage 1, be provided with pipelining-stage 1 register file, in pipelining-stage 2, be provided with pipelining-stage 2 register file ..., in pipelining-stage N, be provided with pipelining-stage N register file.Realized the transmission of data by register file between each pipelining-stage.Arrange before the execution time startup of each pipelining-stage and determine, and as a preferred embodiment, all pipelining-stages have the identical pipelining-stage time.The unified pipelining-stage time can ensure that pipelining-stage there will not be spillover.
Processor unit PU in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, carries out in the register file of all deposit data in same pipelining-stage all in this pipelining-stage, comprises depositing of intermediate variable and result.Such as, each processor unit PU in pipelining-stage 1 is connected with pipelining-stage 1 register file, each processor unit PU in pipelining-stage 1 by needing mutual data stored in pipelining-stage 1 register file, shares for other processor units PU in pipelining-stage 1.
Data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step, after current pipelining-stage completes data processing, pass the content in the register file of this pipelining-stage to the register file in next pipelining-stage.Such as, each processor unit PU in pipelining-stage 1 completes respective operation, and when arriving the pipelining-stage time, the content of pipelining-stage 1 register file in pipelining-stage 1 will be passed to pipelining-stage 2 register file in pipelining-stage 2.As a nonrestrictive example, the mode that in adjacent pipelining-stage, register file adopts register space to copy transmits data.
In addition, the processor array of the present embodiment also comprises initial register file, be connected with the register file in first pipelining-stage, specifically be connected with pipelining-stage 1 register file in the present embodiment, for storing pending raw data, when processor array starts, the raw data stored is passed in pipelining-stage 1 register file.Afterwards, pipelining-stage 1 is triggered by startup steering order, and wherein each processor unit PU starts to carry out arithmetic operation according to default instruction to raw data.
The processor array of the present embodiment also comprises result register file, be connected with the register file in last pipelining-stage, specifically be connected with pipelining-stage N register file in the present embodiment, operation result is passed in result register file by pipelining-stage N register file.In addition, result register file can also notify that external devices takes operation result away.
The processor array of the present embodiment also comprises overall pipelining-stage control module (not shown in figure 1), for controlling the conversion between the startup of each pipelining-stage and each pipelining-stage.When processor array starts, after overall pipelining-stage control module receives the enabled instruction of upper level pipelining-stage, to control and the whole processor unit PU started in next stage pipelining-stage start working simultaneously.Can also comprise counting clock in overall situation pipelining-stage control module, after counting down to default periodicity (i.e. pipelining-stage time), data are passed to the register file in next pipelining-stage by the register file triggered in each pipelining-stage.
Still with reference to figure 1, below the course of work of the processor array of the present embodiment is described in detail.
First, it will be appreciated by those skilled in the art that the memory unit for loading instruction and data can also be wrapped up in the processor array periphery shown in Fig. 1.In order to ensure that processor array can high speed processing repetitive operation data, usually the instruction needed for each processor unit PU can be loaded in each processor unit PU in advance.
Prepare raw data to be processed needed for processor array, be loaded in initial register file, after completing loading, get final product each pipelining-stage of start treatment device array.First pipelining-stage 1 is started, raw data in initial register file is passed in pipelining-stage 1 register, whole processor unit PU in pipelining-stage 1 start working, corresponding calculation process is carried out to the raw data in pipelining-stage 1 register file, if each processor unit PU in processing procedure in pipelining-stage 1 needs to carry out data interaction, then can by respective data transfer in pipelining-stage 1 register.When pipelining-stage 1 carries out relevant treatment, initial register file can also load new pending file.
After pipelining-stage 1 has processed, will start next pipelining-stage, i.e. pipelining-stage 2, this process can be carried out the scheduling controlling of the overall situation by overall pipelining-stage control module.Now, all or part of content in pipelining-stage 1 register file is passed in pipelining-stage 2 register file, continues follow-up process, ensures the work of whole system flowing water, raises the efficiency.
After in the end a pipelining-stage (i.e. pipelining-stage N) completes process operation, final result data is passed in result register file.Now, peripheral reading and writing data parts just can read relevant result data from result register file.
So, then the high speed of whole processor array, parallel, flowing water work is achieved.
It should be noted that, in order to utilize processor array as far as possible efficiently, when the time needed for not considering data loading and store, needing the instruction number that each pipelining-stage of reasonable arrangement performs, to improve overall work efficiency as far as possible.
To sum up, the processor array tool of the present embodiment has the following advantages:
Share register file by same pipelining-stage, the multiple processor units in same pipelining-stage can be made in same register file to realize data sharing, simultaneously each processor unit can walk abreast again computings different separately.
By the parallel running of multiple pipelining-stage, effectively can reduce computer processing time, the evaluation work completed needing the multicycle, the multiple pipelining-stages effectively splitting into cycle short period process.Each pipelining-stage works alone simultaneously, and the method that can be copied by register space between pipelining-stage transmits calculating data.
Based on above two kinds of means, the processor array of the present embodiment is defined by the multiple processor unit in same pipelining-stage and multiple pipelining-stage, processor units all in this processor array can be put at one time and work alone, and effectively improves bulk treatment ability.
Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can make possible variation and amendment, the scope that therefore protection scope of the present invention should define with the claims in the present invention is as the criterion.

Claims (7)

1., based on a processor array for shared register and stream treatment, it is characterized in that, comprising:
Multiple processor unit, be divided in multiple pipelining-stage, each pipelining-stage comprises one or more processor unit, and the processor unit between different pipelining-stage is separate;
Multiple register file, be distributed in described multiple pipelining-stage, each pipelining-stage comprises a register file, processor unit in same pipelining-stage carries out data interaction by the register file in this pipelining-stage, data are passed to the register file in next pipelining-stage by register file in each pipelining-stage step by step, after current pipelining-stage completes data processing, the content in the register file of this pipelining-stage is passed to the register file in next pipelining-stage.
2. processor array according to claim 1, is characterized in that, also comprises:
Initial register file, be connected with the register file in first pipelining-stage in described multiple pipelining-stage, for storing pending raw data, when described processor array starts, described raw data is passed the register file to described first pipelining-stage.
3. processor array according to claim 1, is characterized in that, also comprises:
Result register file, is connected with the register file in last pipelining-stage in described multiple pipelining-stage, and data are passed to described result register file by the register file in last pipelining-stage described.
4. processor array according to any one of claim 1 to 3, is characterized in that, also comprises:
Overall situation pipelining-stage control module, for controlling the conversion between the startup of described multiple pipelining-stage and each pipelining-stage.
5. processor array according to any one of claim 1 to 3, is characterized in that, described multiple pipelining-stage has the identical pipelining-stage time.
6. processor array according to any one of claim 1 to 3, is characterized in that, the mode that the register file in adjacent pipelining-stage is copied by register space transmits data.
7. processor array according to any one of claim 1 to 3, is characterized in that, described processor unit comprises MIPS core, ARM core or DSP core.
CN201310027755.6A 2013-01-23 2013-01-23 Based on the processor array of shared register and stream treatment Active CN103106175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310027755.6A CN103106175B (en) 2013-01-23 2013-01-23 Based on the processor array of shared register and stream treatment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310027755.6A CN103106175B (en) 2013-01-23 2013-01-23 Based on the processor array of shared register and stream treatment

Publications (2)

Publication Number Publication Date
CN103106175A CN103106175A (en) 2013-05-15
CN103106175B true CN103106175B (en) 2015-12-23

Family

ID=48314045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310027755.6A Active CN103106175B (en) 2013-01-23 2013-01-23 Based on the processor array of shared register and stream treatment

Country Status (1)

Country Link
CN (1) CN103106175B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107678781B (en) * 2016-08-01 2021-02-26 北京百度网讯科技有限公司 Processor and method for executing instructions on processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1072788A (en) * 1991-11-27 1993-06-02 国际商业机器公司 The computer system of dynamic multi-mode parallel processor array architecture
CN101021830A (en) * 2007-03-29 2007-08-22 中国人民解放军国防科学技术大学 Method for multi-nuclear expansion in flow processor
CN101526897A (en) * 2009-01-22 2009-09-09 杭州中天微系统有限公司 High speed associate processor interface of embedded processor
CN101944013A (en) * 2008-12-31 2011-01-12 英特尔公司 Processor extensions for execution of secure embedded containers

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6948051B2 (en) * 2001-05-15 2005-09-20 International Business Machines Corporation Method and apparatus for reducing logic activity in a microprocessor using reduced bit width slices that are enabled or disabled depending on operation width
US7007128B2 (en) * 2004-01-07 2006-02-28 International Business Machines Corporation Multiprocessor data processing system having a data routing mechanism regulated through control communication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1072788A (en) * 1991-11-27 1993-06-02 国际商业机器公司 The computer system of dynamic multi-mode parallel processor array architecture
CN101021830A (en) * 2007-03-29 2007-08-22 中国人民解放军国防科学技术大学 Method for multi-nuclear expansion in flow processor
CN101944013A (en) * 2008-12-31 2011-01-12 英特尔公司 Processor extensions for execution of secure embedded containers
CN101526897A (en) * 2009-01-22 2009-09-09 杭州中天微系统有限公司 High speed associate processor interface of embedded processor

Also Published As

Publication number Publication date
CN103106175A (en) 2013-05-15

Similar Documents

Publication Publication Date Title
CN101833441B (en) Parallel vector processing engine structure
CN105487838A (en) Task-level parallel scheduling method and system for dynamically reconfigurable processor
CN103377157B (en) A kind of double-core data communications method for built-in digital control system
WO2022161318A1 (en) Data processing device and method, and related products
CN102122275A (en) Configurable processor
CN105183698A (en) Control processing system and method based on multi-kernel DSP
CN104317770A (en) Data storage structure and data access method for multiple core processing system
KR101639853B1 (en) Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
CN102087609A (en) Dynamic binary translation method under multi-processor platform
CN112199173A (en) Data processing method for dual-core CPU real-time operating system
CN104657111A (en) Parallel computing method and device
CN103984677A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN106648758A (en) Multi-core processor BOOT starting system and method
CN110991619A (en) Neural network processor, chip and electronic equipment
CN102360281B (en) Multifunctional fixed-point media access control (MAC) operation device for microprocessor
CN101751373A (en) Configurable multi-core/many core system based on single instruction set microprocessor computing unit
CN106293736B (en) Two-stage programmer and its calculation method for coarseness multicore computing system
CN103106175B (en) Based on the processor array of shared register and stream treatment
CN103761213A (en) On-chip array system based on circulating pipeline computation
Stepchenkov et al. Recurrent data-flow architecture: features and realization problems
CN102023846B (en) Shared front-end assembly line structure based on monolithic multiprocessor system
CN111381882B (en) Data processing device and related product
CN111078286B (en) Data communication method, computing system and storage medium
CN117222991A (en) Network-on-chip processing system
CN102393814B (en) A kind of system being generated dynamic reconfigurable processor configuration information by software mode

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant