CN102306139A

CN102306139A - Heterogeneous multi-core digital signal processor for orthogonal frequency division multiplexing (OFDM) wireless communication system

Info

Publication number: CN102306139A
Application number: CN201110242603A
Authority: CN
Inventors: 王沁; 徐力; 史少波
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2011-08-23
Filing date: 2011-08-23
Publication date: 2012-01-04

Abstract

The invention provides a heterogeneous multi-core digital signal processor for an orthogonal frequency division multiplexing (OFDM) wireless communication system, and relates to the field of microprocessor system structures. The processor consists of a set of processor cores which are distributed in a row, wherein the processor cores can be divided into different types according to computing capability; the different types of processor cores are mutually connected in an open loop interconnection mode; the processor cores are very long instruction word (VLIW) processors; data transmission among the processor cores is realized by a shared memory; and control signals are transmitted through bus control units in the processor cores and task scheduling buses outside the processor cores. Each processor core can receive task scheduling information from other processor cores, so that the design of a dedicated task scheduling unit or a master control processor is eliminated, and the expandability and simplicity of the multi-core processor are ensured; and the structure of the multi-core processor can effectively accord with the characteristic of wireless communication baseband processing and achieves high performance per watt.

Description

The heterogeneous polynuclear digital signal processor that is used for the OFDM wireless communication system

Technical field

The invention belongs to computing machine and digital communicating field, relate to a kind of heterogeneous multi-nucleus processor, all processor cores in this polycaryon processor all are very-long instruction word processors, and simultaneously, processor core is classified according to computing power, and processor designs for isomery.

Background technology

Polycaryon processor is meant integrated two or more complete calculating inner core in single processor chips.Application towards the various computing characteristic; Multicore processor architecture is the constantly development along with using the real needs of architecture also; Emerge a lot of novel multicore processor architectures; Particularly because wireless communication protocol flourish, become the focus of high-performance calculation research to the architecture of this compute-intensive applications.

Wireless communication field constantly develops in recent years; Various communication standards emerge in an endless stream; It particularly is the communication system of main modulation system with the OFDM technology; Because it has the higher availability of frequency spectrum and good anti-multipath interference performance; Be widely used in current and the next generation wireless communication system in, such development trend has proposed more and more higher requirement to the ability of existing digital signal processor.The very long instruction word framework can improve the instruction concurrency and the data parallelism of single processor core, and can be good at improving the computation capability of whole digital signal processor based on the polycaryon processor framework of very long instruction word.To being that the communication protocol (like WiFi, UWB, WiMax) of main modulation system is analyzed with ofdm system; Discovery is in the wireless communication protocol Base-Band Processing; Handled data are to transmit between processor with the form that flows; Stream is data queue continual, continuous, that move; And most flow data all can carry out the vector calculus after the parallelization, and the different phase in the Base-Band Processing process all can be abstract for front and back continuously and have a flow model of data dependence relation.

To this characteristics; Different disposal module in the flow model can be mapped to respectively on the processor core with different disposal characteristic; This processor that designs according to the wireless communication protocol processing feature; Can allow processor on computing power farthest near special IC; Simultaneously can realize communication protocol with the mode of software programming again, make that the renewal of agreement is more flexible with upgrading.

The present invention promptly is based on above background, has proposed a kind of heterogeneous polynuclear digital signal processor based on the very long instruction word technology, deals with the problem of the proposition in the current high-performance computing sector.

Summary of the invention

In order to address the above problem, the purpose of this invention is to provide a kind of towards being master's wireless communication protocol Base-Band Processing with the OFDM modulation technique.This polycaryon processor is the basic composition unit with very-long instruction word processor nuclear, comes processor core is controlled through internuclear interconnect architecture, allows different processor cores carry out data communication simultaneously.

Technical scheme of the present invention is: a kind of heterogeneous polynuclear digital signal processor that is used for ofdm system; This processor comprises the processor core set that contains a processor core at least and distribute with the form of going; Said processor core is by the computing power classification; Can be divided into different types, realize data transmission and processor core is carried out behavior control through shared storage between the different type processor nuclear through bus control unit and task scheduling bus.Simultaneously, the quantity of processor core set center can be N (N is the natural number greater than 1).Application characteristic according to Base-Band Processing; Different processor cores has different computing powers; With different computation requirement in the reply processing procedure; Such as calculating in the high processing module at vector; Can design more computing unit; And when containing scalar and calculate with vector calculating mixed type module, then can design less computing unit and can satisfy application demand.Therefore; Processor core in this processor core set is also incomplete same, can be divided into different types by computing power, and according to corresponding processing module in the Base-Band Processing process; Arrange according to a certain order, form a complete polycaryon processor with the mode of isomery.The outer shared storage of processor core, the bus control unit in the nuclear, and the task scheduling bus shared of processor core set constitute the interconnect architecture of processor core, the form that this interconnect architecture can be referred to as open loop (open-ring) connects.

Further, said processor core comprises functional unit and bus control unit; Said bus control unit is used for the schedule information that control module produces calculation task; Said functional unit is used to accomplish the computational logic of digital signal; Wherein, said bus control unit comprises steering logic, arbitrated logic, gate, coupling steering logic, task scheduling information register, arbitration information register and FIFO storer; Said steering logic is used for producing the task scheduling order; Said arbitrated logic is used for judging that whether effective, the said FIFO storer of current task scheduling information to processor core is used for interim store tasks schedule information; Said functional unit comprises value parts, code translator, control module, data switching networks, computing unit, memory interface unit, global register group and privately owned storer.

Further, processor core adopts Harvard structure, and promptly data storage and instruction storage are separate.Processor core inside comprises the privately owned memory cell (Private Mem) of certain scale, is used for intermediate data that storage of processor nuclear produces in the execution algorithm program process and the fixed coefficient used commonly used in the digital signal processing algorithm.The left and right sides of each processor core all has shared storage (Share Mem) to link to each other (it is outer that shared storage is positioned at processor core), is used for storing current DSP and examines the data that will read in or the data that will write out.Each processor core can have two independently privately owned storeies, and links to each other with processor core through two ports respectively, and we can be referred to as P_x and P_y respectively, and these two ports can carry out read-write operation; Processor links to each other through the S_l port with the shared storage on the left side, links to each other through the S_r port with the shared storage on the right, and these two ports also can carry out read-write operation.

Further, said processor core is a very-long instruction word processor, and said very-long instruction word processor instruction word flowing water is respectively through value register, decoding register and execution register.These 3 grades of instruction pipelinings are accomplished the complete function of an instruction.Getting the finger stage, processor core takes out instruction according to the value of PC register and is put on the order register; The decoding stage deciphers the instruction on the order register, and the result that will decipher is distributed to each computing unit CU, privately owned memory interface unit P-MIU, publicly-owned memory interface unit S-MIU and data switching networks; Execute phase CU, privately owned storer and publicly-owned memory interface unit and data switching networks are carried out corresponding action.In the execute phase, each CU and memory control unit are read and write registers group through data communication units, and control module then can be with the write back address PC register of next bar instruction.

Further, said instruction word is by the flag field, and constant is digital section immediately, control field and CU field; Said CU field is used for accomplishing the control to each said computing unit and memory interface unit; Each computing unit and memory interface unit in the said processor core all take an instruction field, constitute the CU field; Each said instruction field comprises the flag field; Right operand field; Left side operand field; The CU field is the chief component of very long instruction word; Be used for accomplishing the control to each computing unit and memory interface, the computing unit in the processor core and the instruction field of memory interface constitute whole CU fields; Each concrete CU field also comprises CU flag field, right operand field, left operand field, destination address field (DAF) and operand field.VLIW and disunity in the processor sets, the processor core of various computing configuration of cells has different VLIW.For the very long instruction word (VLIW) word format; Effectively indicating is to be used for representing whether current very long instruction word is special, and when tag was special instruction, its function was 14 global registers of initialization; Wherein preceding 7 registers are deposited address commonly used, and back 7 registers are deposited commonly used several immediately.Constant digital section immediately then is to deposit one that this instruction word uses in the cycle to count immediately.The Control field then is to be used for the control data exchange network, indication this instruction word in the cycle which CU or MIU carry out data communication.The CU field is the main body ingredient of very long instruction word, and each computing unit or memory interface unit all can have a CU field (memory interface unit MIU is a kind of special CU).Each CU field then is made up of five fields, and the function of CU indication field is that to be used for indicating current this microcode be to belong to which CU or MIU.Right operand and left operand are represented the register address of the left and right sides operand of current C U respectively, and destination address field (DAF) is then represented the result register address of computing, and operand word represents current C U carries out the operation of which kind of type.

Further; Processor core computing unit collocation method according to OFDM Base-Band Processing practical application characteristics; Real-time requirement according to processing module; Can use the High Performance DSP of more computing unit to examine realizes; And the requirement of the real-time of some processing module is low; Can use the low-power consumption DSP of less computing unit to examine and realize, to reach optimum performance power consumption ratio.Is master's physical layer protocol to certain with the OFDM modulation technique, and a kind of concrete configuration of processor core can be (sequence number is for count from left to right), the 1st; 2,5,6 nuclears are a kind of configuration; All have 4 arithmetic logic unit, 2 multiply accumulating unit and 2 shift units; The 3rd, 7 nuclear is a kind of configuration, all has 2 arithmetic logic unit, 2 multiply accumulating unit and 2 shift units; The 4th, 8 nuclear is a kind of configuration, all has 1 arithmetic logic unit, 2 multiply accumulating unit and 2 shift units; The 9th nuclear is a kind of configuration, is 1 arithmetic logic unit, 1 shift unit, and Viterbi hardware accelerator.

Further, based on the processor core interconnect architecture of synchronous data flow characteristic--open loop interconnect architecture, this structure is considered as two types operation with data transmission between processor core and control information transmission.For the transmission of data, this structure mainly relies on the switch unit in the processor core to realize the forwarding of data between the different sharing storer is realized.If a processor core forwards the data to other processor cores, these data must be passed through all shared storages between these two processor cores; Control to processor core then relies on bus control unit and the outer task scheduling bus of nuclear in the processor core to accomplish; Bus control unit is that processor core is exclusive; Be used for receiving the task scheduling information of control current processor nuclear, the task scheduling bus be used for transmitting task scheduling information and for processor core shared.

Further, the bus control unit in the open loop interconnect architecture, processor core produce the task scheduling information of seizing the arbitrating signals of task scheduling bus and controlling other processor cores through bus control unit.Bus control unit comprises steering logic, arbitrated logic (Arbiter Logic), FIFO storer, gate, deposits the register of schedule information and arbitrating signals.Steering logic is used for producing the task scheduling order, and arbitrated logic is used for judging whether current task scheduling information to processor core is effective, the FIFO storer is used for the store tasks schedule information.The method of application that is used for this open loop interconnect architecture bus control unit and task scheduling bus, for the processor core set, its priority is that a left side is preferential, the processor core that promptly is positioned at the left side preferentially takies the task scheduling bus.When processor core need be applied for the task scheduling bus, the steering logic in the control module can produce application information, and submits to arbitrated logic, and arbitrated logic is judged, if current task scheduling bus is occupied, then application information is deposited in the FIFO storer; If current task scheduling bus is not occupied, then can successfully apply for task scheduling bus.Simultaneously, bus message is monopolized in processor core transmission to the left and right sides immediately after processor core is successfully applied for task scheduling bus.Task scheduling information is the wide data of F bits, is used for identifying the program origin of required DSP nuclear numbering of next task and the execution of this DSP nuclear.

Beneficial effect of the present invention:

To being master's wireless communication protocol with the OFDM modulation technique, the present invention has designed a kind of heterogeneous multi-nucleus processor with the set of very-long instruction word processor nuclear, realizes the Base-Band Processing of this type of communication protocol.The flow model that each processor core in this polycaryon processor all is directed against the wireless communication protocol Base-Band Processing is optimized design respectively, can pay minimum calculation cost satisfying under the system performance requirement.Compared with prior art; Polycaryon processor set proposed by the invention; Can carry out the expansion in the processor core set according to application demand; Also can be according to the specific processor core of the design of the algorithm application in the Base-Band Processing process; Simultaneously; Adopt the shared storage design between processor core; Make each processor core can visit its adjacent shared storage simultaneously; The problem of having avoided the bus type storer can only conduct interviews by processor core at synchronization, thus the executed in parallel of each processor core guaranteed.The data transmission and the control of processor core are separated from each other, and each processor core can be confirmed the executing state of current processor nuclear by himself sending task scheduling information.Each processor core can receive task scheduling information from other processor core, thereby has reduced the design of dedicated task scheduling unit or main processing controller, has guaranteed the extensibility and the simplification of polycaryon processor.From the design example result, the polycaryon processor structure that adopts the present invention to propose can effectively be agreed with the wireless communication baseband processing feature, reaches higher performance power consumption ratio.

Description of drawings:

Fig. 1 is the integrated stand composition of polycaryon processor of the present invention.

Fig. 2 is the Organization Chart of functional unit in the processor core of the present invention.

Fig. 3 is the streamline synoptic diagram of processor core of the present invention.

Fig. 4 is a very long instruction word (VLIW) word format synoptic diagram of the present invention.

Fig. 5 is the Organization Chart of bus control unit of the present invention.

Fig. 6 is the schematic diagram of switch unit in the processor core of the present invention.

Among the figure:

1-1. shared storage (being used for accepting data) from the outside; 1-2. shared storage (be used between the transmission processor nuclear data); 1-3. processor core; 1-4. task scheduling bus; 1-5. data forwarding unit; 1-6. bus control unit; 1-7. functional unit; 2-1. value parts; 2-2; Code translator; 2-4. control module; 2-5; Data switching networks; 2-6. computing unit; 2-7. memory interface unit; 2-8. global register group; 2-9. privately owned storer, the shared storage input of a 2-10. left side, the right shared storage input of 2-11.; 3-1. value register; 3-2. the decoding register, 3-3. carries out register, the data switching networks control signal that 3-4. decoding back produces; 3-5. be assigned to the microcode on each CU after the decoding; 3-6. the data switching networks register, 4-1. flag field, the 4-2. constant is digital section immediately; 4-3. control field; 4-4.CU field, 4-5. instruction field, 4-7.CU flag field; 4-8. right operand field; 4-9. left operand field, 4-10. destination address field (DAF), 4-11. operand field; 5-1. steering logic; 5-2. arbitrated logic, the 5-3. gate, 5-4. mates steering logic; 5-5. task scheduling information register; 5-6. the task scheduling information of left side processor core input, 5-7. is the task scheduling information of processor core output to the left, and 5-8. is the task scheduling information of processor core output to the right; 5-9. the task scheduling information of right side processor core input; 5-10. the arbitrating signals input, 5-10. arbitrating signals output 5-12. arbitration information register, 5-13.FIFO storer.

Embodiment

Below in conjunction with specific embodiment technical scheme of the present invention is further specified.

Application example towards the OFDM Base-Band Processing

We divide three parts to set forth the specific embodiment of the present invention, and first is with OFDM wireless communication baseband application point of view, and each flow process of how telling about Base-Band Processing in conjunction with Fig. 1 is mapped on the polycaryon processor; Second portion is with the angle of multinuclear, and setting forth between a plurality of processor cores in conjunction with Fig. 1 Fig. 5 and Fig. 6 is how to intercom mutually and how concurrent working; Third part is with the angle of monokaryon, sets forth how combine digital signal processing algorithm program of single processor core in conjunction with Fig. 2,3,4.

IEEE 802.11a is one of wireless lan (wlan) standard, adopts OFDM (OFDM) modulation technique as its physical layer standard, and we are treated to example with the base band receiving end of this agreement and set forth how to use this heterogeneous multi-nucleus processor.In receiving end is handled; Frame is handled real-time and is required high; Physical layer frame for difference in functionality has the different time limits; Process limited such as the OFDM Frame that carries Main physical layer information is 4us, is assigned to processing OFDM Frame task handling device nuclear like this and just must in 4us, finishes whole work.For the physical layer frame of handling a complete IEEE 802.11a, need following processing module:

Processing module for lead code comprises: frame synchronization, decimal times Nonlinear Transformation in Frequency Offset Estimation, carrier wave frequency deviation compensation (CFO, integral multiple carrier deviation estimation, go protection at interval and channel estimating.Processing module for signal code and data symbol comprises: carrier wave frequency deviation compensation, go protection at interval, 64 FFT, channel equalization, go pilot tone, demodulation, deinterleaving, separate reliability enhanced deleting, Viterbi decoding, descrambling.We specifically dissolve a polycaryon processor from the structure of Fig. 1 like this; Above task is mapped in respectively on each processor core; We have designed the processor that contains 9 processor cores of an isomery, accomplish complete IEEE802.11a physical layer frame and handle

The design object of processor core is in the process limited, to use minimum computing unit to accomplish the corresponding calculated task; The algorithm function division is made in processing to entire I EEE 802.11a receiving end; In each processor core, adopt few computing unit of trying one's best to realize distributing algorithm task thereon; Like this; Because the otherness in algorithm process stage; The concrete configuration of each processor core is inevitable inconsistent, accomplishes the processing of integral baseband receiving end thereby obtain a heterogeneous multi-nucleus processor with 9 processor cores.From left to right mapping, we can be mapped in the part of frame synchronization, decimal times Nonlinear Transformation in Frequency Offset Estimation and carrier wave frequency deviation compensation on the first processor nuclear; With another part be mapped on second processor core; 64 FFT are mapped on the 3rd processor core; Integral multiple carrier deviation estimation (is mapped on the four-processor nuclear; The part of carrier wave frequency deviation compensation is mapped on the 5th processor core; With the part of carrier wave frequency deviation compensation and go protection to be mapped on the 6th processor core at interval; Channel equalization and 64 FFT are mapped on the 7th processor core; With demodulation, deinterleaving, separate reliability enhanced deleting and be mapped on the eight processor nuclear; Viterbi decoding, descrambling are mapped on the 9th processor core.

Through the mapping of said method, the whole algorithm function that IEEE802.11a base band receiving end can be handled realize that its concrete configuration is shown in table one.

DSP nuclear	CU number and kind on the DSP nuclear
		DSP1	4alu+2mac+2shifter
DSP2	4alu+2mac+2shifter
		DSP3	2alu+2mac+2shifter
DSP4	1alu+2mac+1shifter
		DSP5	4alu+2mac+2shifter
DSP6	4alu+2mac+2shifter
		DSP7	2alu+2mac+2shifter
DSP8	1alu+2mac+1shifter
		DSP9	1alu+1shifter+Virtebi Accelerator

The embodiment of a plurality of processor core collaborative works:

In the present invention, each processor core only is responsible for certain part of functions in the Base-Band Processing, through the work of collaborative entire process device nuclear, accomplishes whole Base-Band Processing work.Each processor core can send a task scheduling information on the task scheduling bus after its current task disposes; Other processor cores of deactivation are opened corresponding the processing; The transmission of this information via; Can be received by other processor cores; Judge through respective logic; The processor core that is activated can read appropriate address from task scheduling information, and the control module of current processor nuclear also can be changed to the pc value this address, carries out thereby beginning is corresponding.For example, in Fig. 1, after processor core 1 (core1) is handled current task, can send a task scheduling information and give processor core 2 (core2), processor core 2 reads the input data after receiving information from shared storage 2, the beginning handled.The processor core control of back similarly.

Collaborative work between a plurality of processor cores mainly comprises the transmission of data and the forwarding of control information.The transmission of data is meant that data that i processor core disposes write i+1 the shared storage on its right; Input data as i+1 processor core; When i+1 processor core started working, just i.e. i+1 the shared storage from its left side read in data.In practical application; Because shared storage is that the programmer is controlled; Can, fix algorithm when realizing the address of input data and output data; The start address of writing out data such as to i processor core is set to 0x0100; And the start address of reading in data of i+1 processor core is set to 0x0100 equally, to guarantee the data consistency of pre-process and post-process device nuclear.

It is said that for reducing the defeated delay that brings of factor, the left side shared storage of each processor core is all deposited the input data generally speaking, the right shared storage is all deposited output data.The nuclear of striding in particular cases transmits, and such as i processor core output data, in i+2 processor core, also will use; This situation then need use i+1 processor core to use as transfer nuclear; As shown in Figure 6, through writing special instruction, this instruction can be write:

Its implication is for to allow the data of left shared storage write left shared storage interface register in first cycle; Left shared storage interface register is directly passed to right shared storage interface register with the data of reading through data switching networks in second period; At the 3rd cycle right-of-center in political views shared storage interface register data are write back right shared storage; Through writing recursion instruction, N data can be transferred on i+1 the shared storage from i shared storage in N+2 clock period like this.

The control of processor core then is to accomplish through cooperatively interacting of the bus control unit in task scheduling bus and the processor core, the task scheduling bus be used between processor core scheduling transmission task information with and the control information of processor checkbus.5-6 as shown in Figure 5,5-7,5-8,5-9, wherein 5-6 sends to the task scheduling information that current processor is examined for left side processor core, and 5-9 sends to the task scheduling information of current processor nuclear for the right processor core; 5-7 is that current processor is authorized the task scheduling information of giving left side processor core, and 5-8 is that current processor is authorized the task scheduling information of giving the right processor core.This task scheduling information can be that 37bit is wide, and wherein high 5 are used for identifying the required DSP nuclear numbering of next task, and other 32 are used for representing the program origin that this DSP nuclear is carried out.The bus control information was a processor core before sending task scheduling information, the arbitration information of being sent in the time of need taking the task scheduling bus.

Use flow process and arbitration mechanism for bus control unit are divided into following 3 points:

1) in this polycaryon processor, processor core is seized the priority of task scheduling bus for left preferential, and promptly the priority of the processor core on the left side will be higher than the processor core on its right.For i processor core; When if the bus control unit of its left side processor core (i-1) need be applied for the task scheduling bus and send arbitrating signals; The bus control unit of i processor core can be received this arbitrating signals; Pass through arbitrated logic simultaneously, transmit this arbitrating signals to the processor core on its right side.The processor core of back is similar operations, for all processor cores on i-1 processor core the right, can in a clock period, finish above whole operation.

Such i-1 processor core can successfully take the task scheduling bus.The next clock cycle after successfully taking the task scheduling bus; This processor core sends the task scheduling information of 37bit on the task scheduling bus; Because the arbitrated logic on each processor core can be gone into the interpretation input of current arbitration information as gate at the arbitrated logic register memory; After other processor cores are received task scheduling information, can carry out gating judges; If the arbitrating signals of depositing among the arbitrated logic register 5-12 is from other processor core; Then this gate forwards the current task scheduling information that receives, otherwise what send is the task scheduling information that current processor nuclear produces.For example the task scheduling information 5-5 of external tasks schedule information 5-6 and self generation is as the input of gate, and arbitration register 5-2 imports as interpretation, and whether come interpretation current is forwarding.

2) receive this schedule information when all the other processor cores; Gating judgement through in the first step can be transmitted operation; The 37bit information of Zhuan Faing can get into coupling steering logic 5-4 simultaneously; Come to carry out matching operation to high 5 in the 37bit information; Judge whether this task scheduling information is effective to current processor nuclear; Come into force if judge; Then it is deposited among the FIFO storer 5-13; If receiving the processor core current task of schedule information disposes; Can the pc value be turned on the address in the task schedule information through control module, carry out thereby begin new task.

3) if when current processor nuclear nuclear needs application bus; Can issue arbitrated logic through the steering logic in the bus control unit and produce arbitrating signals; If current processor nuclear is not received the arbitrating signals from left side processor core; Its arbitrating signals can be transmitted to the right; If successfully obtain the task scheduling bus, then send task scheduling information.If current arbitrating signals can't produce, then task scheduling information is deposited among the task scheduling information register 5-5, wait for that arbitrating signals comes into force.When current DSP nuclear also needed application bus if the left side bus control unit has been applied for bus, current control module was preserved current task scheduling information, only transmitted the arbitrating signals and the task scheduling information of left side bus.After the left side bus control unit is accomplished total line traffic control, transmit current arbitrating signals and task scheduling information again.

?

Embodiment in the single processor core:

As shown in Figure 2; Single processor core 1-3 is an elementary cell of forming heterogeneous multi-nucleus processor; And the core of single processor core is functional unit 1-7; It comprises the value parts; Code translator; Control module; Data switching networks; Computing unit; Memory interface unit; Registers group; Left side shared data storer; Right shared data storer; Private data storer and command memory; These parts constitute a complete functional unit; Through linking to each other with the FIFO storer, link to each other with left and right sides shared storage through the shared storage reading-writing port with the task control line.

The digital signal algorithm is realized with assembly language program(me); Write the command memory of processor core with the machine instruction form through translation; This is on the task scheduling bus, to give initialization task schedule information of first processor core, and the control module of this processor core is with the original execution address of PC directional order storer.

Value: sense order from command memory, and deposit order register in, and because very long instruction word is not the instruction of regular length, thereby reads an instruction and need accomplish some clock period, concrete clock periodicity is looked concrete instruction length and is decided.For example first processor core has 12 CU and MIU instruction slots in Fig. 7, and its instruction width is 378bit.

Decoding: the instruction to writing order register is deciphered, and the result after the decoding can deposit the decode results register in, then decode results is assigned to each CU, privately owned Memory Controller, publicly-owned Memory Controller and data switching networks.

Carry out: each CU and MIU receive decode results, are divided into operand, destination register address, right operand register address, left operand register address, effectively indicate these five types for the CU decode results.Then be divided into operand, destination register address, operand register address, effectively indicate these four types for the MIU decode results.

For the very long instruction word (VLIW) word format; Effectively indicate 4-1 and be used for representing whether current very long instruction word is special, when tag was special instruction, its function was 14 global register groups of initialization 2-8; Wherein preceding 7 registers are deposited address commonly used, and back 7 registers are deposited commonly used several immediately.Constant digital section 4-2 is immediately deposited one that this instruction word uses in the cycle to count immediately.The Control field then is to be used for the control data exchange network, indication this instruction word in the cycle which CU or MIU carry out data communication.CU field 4-4 is the main body ingredient of very long instruction word, and each computing unit or memory interface unit all can have a CU field (memory interface unit MIU is a kind of special CU).Each instruction field 4-5 then is made up of five fields, and the function of the effective indication field 4-7 of CU is that to be used for indicating current this microcode be to belong to which CU or MIU.Right operand field 4-8 and left operand field 4-9 represent the register address of the left and right sides operand of current C U respectively, and destination address field (DAF) 4-10 then representes the result register address of computing, and operand field 4-11 representes that current C U carries out the operation of which kind of type.

In the present invention, the behavior of execute phase comprises that the logical operation of computing unit and computing unit write back storer to the reading of storer.In the execute phase; Computing unit can't be directly from memory read data; But need carry out read operation to storer through memory interface unit 2-7; And data are read in the memory interface registers; Memory interface registers is passed to the register of computing unit with data in the next clock period then, and then accomplishes corresponding calculating.Perhaps, in the CU of computing unit coding, can accomplish data through data switching networks and transmit, in a clock period, accomplish and calculate with the memory interface unit register directly as left and right sides operand.

Processor core among the present invention has 4 memory interface unit, is used for carrying out read-write operation with left and right sides shared storage and two privately owned storeies, and the number register immediately that comprises one 32 bits in each interface unit comes interim store data.Each generally adopts the indirect addressing mode to write back operations in the stage that writes back of computing unit.

What here need state is, in the execute phase, the multiply accumulating unit among the present invention is the monocycle computing unit, promptly in one-period, can accomplish computing in multiplication and the addition.For example; DSP nuclear utilizes multiply accumulating unit (being made up of multiplier and accumulator) to accomplish a standard complex multiplying; I.e. (a+bi) * (c+di)=(ac-bd)+(ad+bc) i; At first in first clock cycle, the accumulation result register of accumulator is carried out initialization operation and accomplish the read operation of memory interface unit simultaneously memory; Result register at second clock cycle multiplier can receive a value and c value from the register of memory interface unit; In the 3rd clock cycle, accomplish the result register that a*c operated and deposited in multiplier; In the 4th clock cycle, a*c result is added to the accumulator result register that initialization finishes; Carry out the b*d operation simultaneously, thereby accomplish multiplication and the add operation in one-period.

Claims

1. heterogeneous polynuclear digital signal processor that is used for the OFDM wireless communication system; It is characterized in that; This processor comprises the processor core set that contains a processor core (1-3) at least and distribute with the form of going; Said processor core (1-3) is by the computing power classification; Can be divided into different types, realize data transmission and processor core is carried out behavior control through shared storage (1-2) between the different type processor nuclear through bus control unit (1-6) and task scheduling bus (1-4).

2. the heterogeneous polynuclear digital signal processor that is used for the OFDM wireless communication system according to claim 1 is characterized in that, said processor core (1-3) comprises functional unit (1-7) and bus control unit (1-6); Said bus control unit (1-6) is used for the schedule information that control function unit produced and received calculation task; Said functional unit (1-7) is used to accomplish the computational logic of digital signal, comprises value parts (2-1), code translator (2-2), control module (2-4), data switching networks (2-5), computing unit (2-6), memory interface unit (2-7), global register group (2-8) and privately owned storer (2-9); Wherein, said bus control unit (1-6) comprises steering logic (5-1), arbitrated logic (5-2), gate (5-3), coupling steering logic (5-4), task scheduling information register (5-5), arbitration information register (5-12) and FIFO storer (5-13); Said steering logic (5-1) is used for producing the task scheduling order; Said arbitrated logic (5-2) is used for judging that current whether effective, the said FIFO storer of task scheduling information to processor core (5-13) is used for interim store tasks schedule information.

3. according to the claim 1 described heterogeneous polynuclear digital signal processor that is used for the OFDM wireless communication system; It is characterized in that; Said processor core (1-3) is a very-long instruction word processor, and said very-long instruction word processor instruction word flowing water passes through value register (3-1), decoding register (3-2) respectively and carries out register (3-3).

4. according to the claim 3 described heterogeneous polynuclear digital signal processors that are used for the OFDM wireless communication system; It is characterized in that; Said instruction word is by flag field (4-1), and constant is digital section (4-2) immediately, control field (4-3) and CU field (4-4); Said CU field (4-4) is used for accomplishing the control to each said computing unit (2-6) and memory interface unit (2-7); Each computing unit (2-6) and memory interface (2-7) unit in the said processor core (1-3) all take an instruction field (4-5), constitute CU field (4-4); Each said instruction field (4-5) comprises CU flag field (4-7), right operand field (4-8), left operand field (4-9), destination address field (DAF) (4-10) and operand field (4-11).