CN106886690A

CN106886690A - It is a kind of that the heterogeneous platform understood is calculated towards gene data

Info

Publication number: CN106886690A
Application number: CN201710055557.9A
Authority: CN
Inventors: 宋卓; 刘蓬侠; 李�根
Original assignee: Human And Future Biotechnology (changsha) Co Ltd
Current assignee: Human And Future Biotechnology (changsha) Co Ltd
Priority date: 2017-01-25
Filing date: 2017-01-25
Publication date: 2017-06-23
Anticipated expiration: 2037-01-25
Also published as: CN106886690B

Abstract

The invention discloses a kind of heterogeneous platform for being calculated towards gene data and being understood, including heterogeneous processor unit, interconnection module, internal memory, gene calculates unscrambling data instruction input unit and gene is calculated and understands result output unit, the heterogeneous processor unit passes through interconnection module and internal memory respectively, gene calculates unscrambling data instruction input unit, gene calculates deciphering result output unit and is connected, the heterogeneous processor unit includes CPU, GPU, DSP and FPGA, wherein CPU constitutes control engine, CPU, GPU, FPGA three constitutes computing engines, CPU, GPU, DSP three's Explanation in Constitution engine.The present invention can provide hardware supported to improve real-time and accuracy, the accuracy of raising gene data deciphering and the readability of gene data calculating, have the advantages that gene data calculates deciphering efficiency high, low cost of manufacture, calculating deciphering energy consumption low.

Description

It is a kind of that the heterogeneous platform understood is calculated towards gene data

Technical field

The present invention relates to gene sequencing technology, and in particular to a kind of that the heterogeneous platform understood is calculated towards gene data.

Background technology

Recent years, with sequencing technologies of future generation（Next Generation Sequence, hereinafter referred to as NGS）'s Extensive use, the cost of gene sequencing declines rapidly, and gene technology initially enters popularization and application.NGS is calculated including gene data Two steps are understood with gene data, it refers to that original gene sequencing data are carried out with pseudo-, duplicate removal that wherein gene data is calculated Deng pretreatment, used when being understood so as to gene data, gene data deciphering refers to the gene number after processing gene data calculating It is analyzed, discloses and explains according to the Scientific Meaning in fields such as biology, medical science, health cares.

There are two bottlenecks of aspect in gene technology clinical practice development：The clinical practice of one restriction gene technology develops Bottleneck is the magnanimity of gene data.Based on the reason for technology, single sample data volume of the original gene data of NGS generations is very Greatly, such as full-length genome（Whole-Genome Sequencing, WGS）Single sample data reach more than 100G, therefore gene The calculating of single sample data has been just the intensive and high computation-intensive task of input/output high；It is fast along with gene technology Speed popularization, causes the total amount exponentially of the original gene data of sequencing generation to increase.So, gene data is carried out in real time, Accurately calculate and transmission becomes extremely difficult, be faced with huge challenge.Therefore, typical method is to possess quantity at present On the computer cluster of the stronger high-performance processor of more, performance, processed with the software based on multithreading.But It is that the shortcoming of this system is：On the one hand, its high cost in storage, power consumption, technical support and maintenance；On the other hand, exist On the premise of ensureing accuracy, its obtainable parallel computation acceleration is still difficult to meet the demand of above-mentioned challenge；It is more main Want, the increase far super Moore's Law of the original gene data of generation is sequenced, so, this method has lacked persistently Property.The bottleneck of another restriction gene technology clinical practice development is the accuracy and readability that gene data is understood.Current base The typical method that factor data is understood is based on mankind's reference gene, after being processed with sequencing generation and through gene data calculating Gene data, reconstructs someone gene.However, currently used reference gene, such as GRCh38, are based on limited sample This, is both not enough to represent the diversity of the whole mankind, and incomplete, is detecting the unique change different time in genes of individuals, standard Gene information flow can cause deviation, and lack the depth intersection analysis with other biological, medical informations.Additionally, gene Data are understood and also rest essentially within professional domain, towards non-professional masses, lack readable, that is, lack direct to gene data Biological meaning and indirectly easy-to-understand, the various informative deciphering of health effect.

At present, processor type common in computer system has central processing unit（Central Processing Unit, abbreviation CPU）, field programmable gate array（Field Programmable Gate Array, abbreviation FPGA）, figure Processor（Graphics Processing Unit, abbreviation GPU）And digital signal processor（Digital Signal Processor, abbreviation DSP）.Current high-performance CPU generally includes multiple processor cores（Processor Core）, from Multithreading, but its design object are supported on hardware still towards general purpose application program, and relative to special calculating, it is general to answer It is smaller, it is necessary to more complicated controls and relatively low performance objective with the concurrency of program.Therefore, the hardware resource master on CPU pieces Still to be used to realizing the control of complexity rather than calculating, not have to include special hardware for specific function, it would be preferable to support meter Calculate degree of parallelism not high.FPGA is a kind of semi-custom circuit, and advantage has：System development is carried out based on FPGA, the design cycle is short, exploitation Expense is low；It is low in energy consumption；Configuration can be remodified after production, design flexibility is high, design risk is small.Have the disadvantage：Realize same In general function, FPGA compares application specific integrated circuit（Application Specific Integrated Circuit, ASIC）Speed it is slow, it is bigger than ASIC circuit area.With the development and evolution of technology, FPGA is to more high density, more great Rong Amount, more low-power consumption and integrated more stone intellectual properties（Intellectual Property, IP）Direction develop, FPGA's Shortcoming is in diminution, and advantage is being amplified.Compared to CPU, FPGA can customize realization, modification and increase with hardware description language Parallel computation.GPU is initially a kind of microprocessor dedicated for image procossing, and texture mapping and many can be supported from hardware The graphics calculations basic tasks such as side shape coloring.It is related to some general mathematicals to calculate because graphics is calculated, such as matrix and vector Computing, and GPU possesses the framework of highly-parallel, therefore, with the development of related software and hardware technology, GPU computing techniques are increasingly Rise, i.e. GPU is no longer limited to graphics process, be also exploited for linear algebra, signal transacting, numerical simulation etc. and count parallel Calculate, the performance of decades of times or even up to a hundred times of CPU can be provided.But current GPU has 2：One is, is limited to The hardware architectural features of GPU, many parallel algorithms can not be efficiently performed on GPU；Two are, can be produced in GPU operations a large amount of Heat, energy consumption is higher.DSP be it is a kind of various signals quickly analyzed with digital method, converted, being filtered, being detected, being modulated, The microprocessor of the calculation process such as demodulation.Therefore, DSP has done special optimization on chip internal structure, such as hardware is realized At a high speed, high-precision multiplication etc..With the arrival of digital Age, DSP is widely used in smart machine, resource exploration, numeral control The every field such as system, biomedicine, space flight and aviation, with low in energy consumption, high precision, can carry out two dimension with multidimensional process the features such as. In sum, four kinds of calculating devices of the above respectively have feature, and respectively have limitation.But, for forementioned gene technology clinical practice Develop two bottlenecks of aspect for existing, mixed architecture platform how is built using above-mentioned processor to realize magnanimity gene number According to calculating understand, then have become a key technical problem urgently to be resolved hurrily.

The content of the invention

The technical problem to be solved in the present invention：For the above mentioned problem of prior art, there is provided one kind can be raising gene Accuracy and readable offer hardware supported that the real-time and accuracy, raising gene data that data are calculated are understood, gene number The low heterogeneous platform understood towards gene data calculating of efficiency high, low cost of manufacture, calculating deciphering energy consumption is understood according to calculating.

In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is：

It is a kind of towards gene data calculate understand heterogeneous platform, including heterogeneous processor unit, interconnection module, internal memory, Gene calculates unscrambling data instruction input unit and gene is calculated and understands result output unit, the heterogeneous processor unit difference Unscrambling data instruction input unit, gene are calculated by interconnection module and internal memory, gene and calculates deciphering result output unit Be connected, the heterogeneous processor unit include CPU, GPU, DSP and FPGA, wherein CPU constitute control engine, the CPU, GPU, FPGA three's composition computing engines, CPU, GPU, DSP three Explanation in Constitution engine, the control engine is by gene meter Calculate unscrambling data instruction input unit reception gene calculating unscrambling data to instruct and be divided into code segment, when the task class of code segment When type is control task, the instruction and data of code segment is dispatched into CPU treatment；When the task type of code segment is appointed to calculate During business, the instruction and data scheduling computing engines of code segment are processed and result of calculation is calculated by gene and is understood result Output unit is exported；When the task type of code segment is for solution reading task, by the instruction and data scheduling solution read engine of code segment Processed and result of calculation is calculated by gene and understood the output of result output unit.

Preferably, the FPGA include cross bar switch, I/O control unit and accelerator unit, the I/O control unit, plus Fast device unit is connected with cross bar switch respectively, and the accelerator unit is included for realizing that hidden Markov model computing hardware adds Speed hidden Markov model computation accelerator, for realizing both hardware-accelerated hash function computation accelerators of Hash calculation In at least one, the I/O control unit is connected with interconnection module.

Preferably, the I/O control unit includes PCIE interfaces, dma controller, PIU peripheral interface units and DDR control Device, the cross bar switch is connected with dma controller, PIU peripheral interface units and DDR controller respectively, the dma controller, It is connected with each other between PIU peripheral interface units, the PCIE interfaces are connected with dma controller, the PCIE interfaces, DDR controls Device is connected with interconnection module respectively.

Preferably, the interconnection module include HCCLink bus modules and HNCLink bus modules, the CPU, GPU, DSP are connected by HCCLink bus modules with internal memory respectively with FPGA, and described CPU, GPU, DSP and FPGA lead to respectively Cross HNCLink bus modules and gene calculates unscrambling data instruction input unit and gene is calculated and understands result output unit phase Even.

Preferably, the gene calculates unscrambling data instruction input unit includes input equipment, common interface module, network At least one in interface module, multimedia input interface module, External memory equipment, sensor.

Preferably, the gene calculates deciphering result output unit includes display device, common interface module, network interface At least one in module, multimedia output interface module, External memory equipment.

Preferably, it is described to include the detailed step that the instruction and data scheduling computing engines of code segment are processed：

A1）Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if line number can be entered According to executed in parallel, if three can not, redirect execution step A7）, exit；Otherwise, execution step A2 is redirected）；

A2）Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect execution Step A3）；Otherwise, execution step A6 is redirected）；

A3）Judge that code segment is assigned to optimize on FPGA to perform（That is executed in parallel, similarly hereinafter）Overhead less than code segment assign Whether the overhead for optimizing execution on to GPU is set up, the code segment be assigned on FPGA optimize execution overhead include CPU Interaction data and instruction are produced and FPGA between communication overhead, the memory access expense of FPGA and the computing cost of FPGA, it is described Code segment is assigned on GPU the overhead for optimizing execution including interaction data between CPU and GPU and instructs the communication for producing to open Pin, the memory access expense of GPU and the computing cost of GPU, redirect execution step A6 if setting up）；Otherwise, execution step is redirected A4）；

A4）Judge whether code segment is preferential energy consumption, if energy consumption is preferential, then redirect execution step A6）；Otherwise, redirect and hold Row step A5）；

A5）Judge that the gene of code segment is calculated if appropriate for GPU acceleration treatment, if being adapted to GPU acceleration treatment, redirect execution Step A8）；Otherwise, execution step A7 is redirected）；

A6）The comprehensive utilization all possible accelerated methods of FPGA, the accelerated method includes parallel instructions, streamline, data simultaneously At least one in row, judges that code segment is assigned on FPGA and optimizes the overhead of execution and is held on CPU less than code segment Whether capable overhead is set up, if set up, redirects execution step A9）, otherwise, redirect execution step A7）；

A7）The instruction and data of code segment is dispatched into CPU treatment, is exited；

A8）The instruction and data of code segment is dispatched into GPU treatment, is exited；

A9）The instruction and data of code segment is dispatched into FPGA treatment, is exited.

Preferably, step A5）Detailed step include：

A5.1）Judge the gene of code segment calculates whether data parallel execution can be carried out, if data parallel execution can be carried out, Redirect execution step A5.2）；Otherwise, execution step A7 is redirected）；

A5.2）Judge that code segment is assigned on GPU to optimize the overhead of execution and be less than the overhead that code segment is performed on CPU No establishment, the code segment is assigned on GPU the overhead for optimizing execution including interaction data between CPU and GPU and instructs product The memory access expense and the computing cost of GPU of raw communication overhead, GPU, the overhead that the code segment is performed on CPU include The memory access expense and the computing cost of CPU of CPU, redirect execution step A8 if setting up）；Otherwise, execution step is redirected A7）.

Preferably, it is described to include the detailed step that the instruction and data scheduling solution read engine of code segment is processed：

B1）Judge whether code segment is Digital Signal Processing respectively, if be non-graphic image class multi-media processing, if be figure Shape image procossing, if three is not, redirects execution step B7）；Otherwise, execution step B2 is redirected）；

B2）Judge whether code segment is graph and image processing, if graph and image processing, then redirect execution step B3）；It is no Then, execution step B5 is redirected）；

B3）Judge that code segment is assigned on DSP and optimizes the overhead of execution to be assigned on GPU and optimize less than code segment Whether the overhead of execution is set up, the code segment be assigned on DSP and optimize execution overhead include CPU and DSP it Between interaction data and the instruction communication overhead, the memory access expense of DSP and the computing cost of DSP that produce, the code segment assigns On to GPU and optimize the overhead of execution and include interaction data and instruction are produced between CPU and GPU communication overhead, GPU The computing cost of memory access expense and GPU, redirects execution step B5 if setting up）；Otherwise, execution step B4 is redirected）；

B4）Judge whether code segment is preferential energy consumption, if energy consumption is preferential, then redirect execution step B5）；Otherwise, redirect and hold Row step B7）

B5）Judge that code segment is assigned on DSP and optimizes the overhead of execution less than the overhead that code segment is performed on CPU Whether set up, the overhead that the code segment is performed on CPU includes the memory access expense of CPU and the computing cost of CPU, if Establishment then redirects execution step B6）；Otherwise, execution step B8 is redirected）；

B6）The instruction and data of code segment is dispatched into DSP treatment, is exited；

B7）Judge that the gene of code segment is understood if appropriate for GPU acceleration treatment, if the gene of code segment is understood suitable GPU and added Speed treatment, then dispatch GPU treatment by the instruction and data of code segment, exits；Otherwise, execution step B8 is redirected）；

B8）The instruction and data of code segment is dispatched into CPU treatment, is exited.

Preferably, step B7）Detailed step include：

B7.1）Judge whether code segment is graph and image processing, if graph and image processing, then redirect execution step B7.3）； Otherwise, execution step B7.2 is redirected）；

B7.2）Judge whether code segment can carry out data parallel execution, if data parallel execution can be carried out, redirect and perform step Rapid B7.3）；Otherwise, execution step B8 is redirected）；

B7.3）Judge that code segment is assigned on GPU and optimizes the overhead of execution always to be opened less than what code segment was performed on CPU Whether pin is set up, and the code segment is assigned to GPU on and optimizes the overhead of execution includes interaction data between CPU and GPU Communication overhead, the memory access expense of GPU and the computing cost of GPU produced with instruction, it is total that the code segment is performed on CPU Expense includes the memory access expense of CPU and the computing cost of CPU, and execution step B7.4 is redirected if setting up）；Otherwise, redirect Perform step B8）；

B7.4）The instruction and data of code segment is dispatched into GPU treatment, is exited.

The present invention calculates the heterogeneous platform tool understood towards gene data and has the advantage that：

1st, hardware and software platform, heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, can allow designer The various gene datas of member's exploitation are calculated, gene data is understood and gene data is calculated and understands application flow, hard without redesigning Part system；Other disclosures can be transplanted or commercial gene data is calculated, gene data understands and gene data is calculated and understands application Software, without redesigning hardware system；Isomery programming language can be used（Such as OpenCL）To realize whole heterogeneous platform application The uniformity of exploitation.

2nd, scalability is good, and heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, can Difference and change according to application demand, neatly extend and reconstruct.

3rd, it is widely used, heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, can either Calculated as local gene data, gene data understands and gene data calculates the processing equipment understood, again can be used as cluster Or gene data is calculated under cloud computing environment, gene data understands and gene data calculates the treatment node understood.

4th, Gao Kepei, heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, in software side Face, four kinds of core components --- CPU, FPGA, GPU and DSP is programming device；In hardware aspect, FPGA can also be After system sizing, production and installation, increment configuration is carried out on demand, that is, change and/or increase function；In terms of application integration, can The various application requirements understood are calculated according to gene data, according to CPU, FPGA, GPU and DSP and the advantageous feature of other hardware, Tissue, scale and relevance to system all parts are configured and used, and are made each part rational division of work and are cooperated, Optimize application flow in maximum efficiency.Present system provides good design flexibility and increasing for system and using designer Amount allocative abilities, it is easy to which upgrading adapts to new application.

5th, matching gene data calculates the Heterogeneous Computing understood（heterogeneous computing）Demand, the present invention Heterogeneous platform be the heterogeneous platform based on CPU plus FPGA, GPU and DSP, can be well matched with and meet now and base in future Factor data calculates the various structurings such as fusion treatment analysis text, picture, voice, audio, video and other electric signals in understanding With the Heterogeneous Computing of unstructured data to the demand of hardware.

6th, high-performance, heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, can be from three Individual aspect is understood for high-performance gene data is calculated and provides hardware supported：One, while providing tasks in parallel, data parallel and algorithm Hardware needed for hardware-accelerated；Two, while providing control task, affairs type task, non-data intensity calculating task, data-intensive Hardware needed for type calculating task；Three, while providing the fusion treatments such as text, picture, voice, audio, video and other electric signals Hardware needed for analysis.

7th, low cost, heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, and is used completely Software processing gene data calculates understand, existing computer cluster or cloud computing platform is compared, while performance is improved, The cost in design, storage, network, power consumption, technical support and maintenance can be reduced.

8th, low-power consumption, heterogeneous platform of the invention is the heterogeneous platform based on CPU plus FPGA, GPU and DSP, by FPGA With the use of DSP, the part work of CPU and GPU is shared, while improving performance and realizing functional diversities, reduce energy Consumption.

Brief description of the drawings

Fig. 1 is the circuit theory schematic diagram of embodiment of the present invention heterogeneous platform.

Fig. 2 is the engine structure schematic diagram of embodiment of the present invention heterogeneous platform.

Fig. 3 is the circuit theory schematic diagram of FPGA in embodiment of the present invention heterogeneous platform.

Fig. 4 is the scheduling flow schematic diagram that embodiment of the present invention heterogeneous platform controls engine.

Fig. 5 is the schematic flow sheet that embodiment of the present invention heterogeneous platform dispatches computing engines.

Fig. 6 is that embodiment of the present invention heterogeneous platform scheduling computing engines judge whether to be adapted to the schematic flow sheet that GPU accelerates.

Fig. 7 is the schematic flow sheet of embodiment of the present invention heterogeneous platform scheduling solution read engine.

Fig. 8 is that embodiment of the present invention heterogeneous platform scheduling solution read engine judges whether to be adapted to the schematic flow sheet that GPU accelerates.

Marginal data：1st, heterogeneous processor unit；11st, engine is controlled；12nd, computing engines；13rd, read engine is solved；2nd, interconnect Bus module；21st, HCCLink bus modules；22nd, HNCLink bus modules；3rd, internal memory；4th, gene calculates unscrambling data instruction Input block；5th, gene is calculated and understands result output unit.

Specific embodiment

As depicted in figs. 1 and 2, the heterogeneous platform understood that calculated towards gene data of the present embodiment includes heterogeneous processor Unit 1, interconnection module 2, internal memory 3, gene calculate unscrambling data instruction input unit 4 and gene is calculated and understands result output Unit 5, heterogeneous processor unit 1 calculates unscrambling data instruction input list by interconnection module 2 with internal memory 3, gene respectively Unit 4, gene calculates deciphering result output unit 5 and is connected, and heterogeneous processor unit 1 includes CPU（Central Processing Unit, central processing unit）、GPU（Graphics Processing Unit, graphic process unit）、DSP（Digital Signal Processor, digital signal processor）And FPGA（Field Programmable Gate Array, scene can Programming gate array）, wherein, CPU constitutes control engine 11, and CPU, GPU, FPGA three constitute computing engines 12, CPU, GPU, DSP Three's Explanation in Constitution engine 13, control engine 11 is calculating the reception gene calculating of unscrambling data instruction input unit 4 by gene Unscrambling data is instructed and is divided into code segment, when the task type of code segment is control task, by the instruction sum of code segment According to scheduling CPU treatment；When the task type of code segment is calculating task, the instruction and data of code segment is dispatched and is calculated Engine 12 is processed and result of calculation is calculated into deciphering result output unit 5 by gene and exported；When the task class of code segment When type is for solution reading task, the instruction and data scheduling solution read engine 13 of code segment is processed and result of calculation is passed through into gene Deciphering result output unit 5 is calculated to export.

In the present embodiment, CPU quantity can be one or more, and each CPU includes one or more processors core （Processor Core）, GPU quantity can be one or more, DSP quantity can be one or more, FPGA quantity Can be one or more, can be entered based on interconnection module 2 between any individual in CPU, GPU, DSP and FPGA Row interconnection and exchange data and instruction, and unscrambling data can be calculated based on the realization of interconnection module 2 and internal memory 3, gene Instruction input unit 4 and gene calculate the arbitrary equipment understood in result output unit 5 and interconnect and exchange data and instruction. Certainly, realize that interconnection and exchange data and the bus form of instruction are not limited to specific interconnection between the said equipment part Mode, can as needed use various concrete implementation modes.

As shown in figure 3, FPGA includes cross bar switch（Crossbar）, I/O control unit and accelerator unit, IO controls are single Unit, accelerator unit are connected with cross bar switch respectively, and accelerator unit is included for realizing hidden Markov model computing hardware The hidden Markov model of acceleration（Hidden Markov Model, HMM）Computation accelerator, for realizing Hash calculation hardware Both hash function (Hash function) computation accelerators of acceleration, I/O control unit is connected with interconnection module 2.This In embodiment, cross bar switch specifically uses Advanced extensible Interface（Advanced eXtensible Interface, AXI）Hand over Fork is closed.Additionally, accelerator unit can also as needed select single hidden Markov model computation accelerator or single Kazakhstan Wish function computation accelerator or use more other similar hardware accelerators simultaneously, other are realized for hardware-accelerated Calculate.

As shown in figure 3, I/O control unit includes PCIE（Peripheral Component Interconnect Express, quick Peripheral Component Interconnect）Interface, DMA（Direct Memory Access, direct memory access）Controller, PIU（Peripheral Interface Unit, peripheral interface unit）Peripheral interface unit and DDR controller, cross bar switch point It is not connected with dma controller, PIU peripheral interface units and DDR controller, phase between dma controller, PIU peripheral interface units Connect, PCIE interfaces are connected with dma controller, PCIE interfaces, DDR controller are connected with interconnection module 2 respectively.DDR Controller is accessed for DDR, and for Large Volume Data provides storage, DDR controller specifically uses DDR4 controllers in the present embodiment. Above-mentioned PCIE interfaces, above-mentioned dma controller, above-mentioned PIU cooperate between above-mentioned FPGA and above-mentioned CPU, and above-mentioned Between FPGA and above-mentioned GPU, data and instruction are transmitted；Above-mentioned cross bar switch is used for above-mentioned dma controller, above-mentioned PIU peripheries and connects Mouthpiece, above-mentioned DDR controller, above-mentioned hidden Markov model computation accelerator, above-mentioned hash function computation accelerator and on The interconnection between other accelerators is stated, is that the data between them and instruction transmission provide path.

As shown in figure 1, interconnection module 2 includes HCCLink（Heterogeneous computing Cache Coherence Link, Heterogeneous Computing storage uniformity interconnection）Bus module 21 and HNCLink（Heterogeneous Computing Non-Coherence Link, the interconnection of Heterogeneous Computing nonuniformity）Bus module 22, CPU, GPU, DSP and FPGA is connected by HCCLink bus modules 21 with internal memory 3 respectively, and CPU, GPU, DSP and FPGA are total by HNCLink respectively Wire module 22 calculates unscrambling data instruction input unit 4 with gene and gene calculates deciphering result output unit 5 and is connected. HCCLink bus modules 21 be used for above-mentioned CPU, above-mentioned FPGA, above-mentioned GPU and above-mentioned DSP and above-mentioned DDR4 memory arrays it Between interconnect and exchange data, instruction.HNCLink bus modules 22 are used for above-mentioned CPU, above-mentioned FPGA, above-mentioned GPU and above-mentioned Control instruction is interconnected and exchanged between DSP；It is defeated with above-mentioned for above-mentioned CPU, above-mentioned FPGA, above-mentioned GPU and above-mentioned DSP Enter/output equipment（I/O）Between interconnect and exchange data, instruction.

In the present embodiment, internal memory 3 is DDR4 memory arrays（Memory Array）.

In the present embodiment, gene calculates unscrambling data instruction input unit 4 includes input equipment, common interface module, net At least one in network interface module, multimedia input interface module, External memory equipment, sensor.In the present embodiment, input Equipment includes at least one in keyboard, mouse, trace ball and Trackpad, and common interface module includes boundary scan interface mould At least one in block, USB module, Network Interface Module includes ethernet interface module, Long Term Evolution At least one in LTE interface module, Wi-Fi interface module, Bluetooth interface module, multimedia input interface module includes simulation At least one in audio input interface, DAB input interface, video input interface, External memory equipment includes flash memory At least one in FLASH, solid-state hard disk SSD, sensor includes temperature sensor, heart rate measurement sensor, fingerprint sensor In at least one.

In the present embodiment, gene calculates deciphering result output unit 5 includes that display device, common interface module, network connect At least one in mouth mold block, multimedia output interface module, External memory equipment.In the present embodiment, display device includes the moon At least one in extreme ray pipe CRT, liquid crystal display LCD, LED, common interface module includes boundary scan At least one in interface module, USB module, Network Interface Module includes ethernet interface module, long-term At least one in evolution LTE interface module, Wi-Fi interface module, Bluetooth interface module, multimedia output interface module includes At least one in analogue audio frequency output interface, digital audio output interface, video output interface, External memory equipment includes dodging Deposit at least one in FLASH, solid-state hard disk SSD.

As shown in figure 4, control engine 11 is calculating the reception gene calculating solution of unscrambling data instruction input unit 4 by gene Read data command and be divided into code segment, the calculating that then task type according to code segment is constituted to CPU, GPU, FPGA three Engine 12, the solution read engine 13 that CPU, GPU, DSP three are constituted carries out integrated dispatch：When the task type of code segment is control times During business, the instruction and data of code segment is dispatched into CPU treatment；When the task type of code segment is calculating task, will generation The instruction and data scheduling computing engines 12 of code section are processed and result of calculation is calculated into deciphering result by gene exports single Unit 5 exports；When the task type of code segment is for solution reading task, the instruction and data scheduling solution read engine 13 of code segment is carried out Process and result of calculation is calculated into deciphering result output unit 5 by gene and export.

In the present embodiment, the function of CPU is as follows：For one or more FPGA of scheduling controlling, and one or more FPGA Interaction data and instruction；For one or more GPU of scheduling controlling, and one or more GPU interaction datas and instruction；For adjusting Degree controls one or more DSP, and one or more DSP interaction datas and instruction；For being interacted with one or more memories Data and instruction；Data and instruction for receiving and processing the input of one or more input equipments；For sending data and referring to Make one or more output equipments；In gene data calculation process, for performing scheduler task, things type task, it is used for Coordinate with one or more FPGA and one or more GPU and perform gene data calculating task；In gene data understands flow, For performing scheduler task, things type task, gene number is performed for coordinating with one or more DSP and one or more GPU According to solution reading task；Gene data calculate understand flow in, for performing scheduler task, things type task, for one or Multiple FPGA and one or more GPU coordinate and perform gene data calculating task, for one or more DSP and one or many Individual GPU coordinates execution gene data solution reading task.

In the present embodiment, the function of FPGA is as follows：For with one or more CPU interaction datas and instruction；Can be used for One or more GPU of scheduling controlling, and one or more GPU interaction datas and instruction；Can be used for scheduling controlling one or many Individual DSP, and one or more DSP interaction datas and instruction；For with one or more memory interaction datas and instruction；Can be with Data and instruction for receiving and processing the input of one or more input equipments；Can be used for sending data and instruction to one Or multiple output equipments；In gene data calculation process, held for coordinating with one or more CPU and one or more GPU Row gene data calculating task, can be used for performing scheduler task, things type task；In gene data understands flow, can be with For performing scheduler task, things type task, can be used for coordinating execution base with one or more DSP and one or more GPU Factor data solution reading task；In gene data is calculated and understands flow, for matching somebody with somebody with one or more CPU and one or more GPU Close and perform gene data calculating task, can be used for coordinating execution gene data with one or more DSP and one or more GPU Solution reading task, can be used for performing scheduler task, things type task.

In the present embodiment, the function of GPU is as follows：For with one or more CPU interaction datas and instruction；Can be used for and One or more FPGA interaction datas and instruction；Can be used for and one or more DSP interaction datas and instruction；For with one Or multiple memory interaction datas and instruction；In gene data calculation process, for one or more FPGA and one or Multiple CPU coordinate execution gene data calculating task；Gene data understand flow in, for one or more DSP and Individual or multiple CPU coordinate execution gene data solution reading task；Gene data calculate understand flow in, for and one or more FPGA and one or more CPU coordinate and perform gene data calculating task, for one or more DSP and one or more CPU coordinates execution gene data solution reading task.

In the present embodiment, the function of DSP is as follows：For with one or more CPU interaction datas and instruction；Can be used for and One or more FPGA interaction datas and instruction；Can be used for and one or more GPU interaction datas and instruction；For with one Or multiple memory interaction datas and instruction；Can be used for receiving and processing the data of one or more input equipments input and refer to Order；Can be used for sending data and instruction to one or more output equipments；Gene data understand flow in, for one Or multiple CPU and one or more GPU coordinates execution gene data solution reading task；In gene data is calculated and understands flow, use Gene data solution reading task is performed in coordinating with one or more CPU and one or more GPU.

In the present embodiment, the function of internal memory 3 is as follows：For storing one or more gene sequencing data, gene sequencing number According to being initial data and/or compressed data, compressed data does not limit compression algorithm；For storing one or more gene reference sequences Row and its corresponding one or more marks；For storing one or more knowns variation data；For storage and gene Data calculate other related input datas；For storing other input datas related to gene data deciphering；For storing Other related input datas of deciphering are calculated to gene data；In gene data calculation process, for store intermediate result and Final data；In gene data understands flow, for storing intermediate result and final data；Calculated in gene data and understand stream Cheng Zhong, for storing intermediate result and final data；Memory species, such as DDR3 are not limited（Dual Data Rate 3）, DDR4 etc..

In the present embodiment, the function that gene calculates unscrambling data instruction input unit 4 is as follows：By being input into based on gene data Data and instruction needed for calculating flow；For the data being input into needed for gene data understands flow and instruction；For being input into gene Data are calculated understands data and instruction needed for flow；Input equipment species, such as keyboard are not limited（Keyboard）, mouse （Mouse）, trace ball（Trackball）, Trackpad（touch pad）Deng input equipment, or boundary scan（Joint Test Action Group, JTAG）, USB（Universal Serial Bus, USB）Deng general-purpose interface, or ether Net（Ethernet）, Long Term Evolution（Long Term Evolution, LTE）, Wireless Fidelity（Wireless-Fidelity, Wi- Fi）, bluetooth（Bluetooth）Deng the network port, or analogue audio frequency input interface（Such as the stereo small three cores interfaces of 3.5mm）、 DAB input interface（Such as Sony/Philips Digital Interface Sony/Philips Digital Interface, S/ PDIF）, video input interface（Such as HDMI High Definition Multimedia Interface, HDMI）Deng multimedia interface, or flash memory（FLASH）, solid state hard disc（Solid State Drives, SSD）Etc. external storage Equipment, or temperature sensor（Measurement body temperature）, optical pickocff（Measurement heart rate）, fingerprint sensor（Collection fingerprint）Deng sensing Device（Sensor）；Do not limit input data and instruction form, such as electric signal, text, picture, voice, audio, video etc. and Their any combination.

In the present embodiment, the function that gene calculates deciphering result output unit 5 is as follows：Stream is calculated for exporting gene data Data and instruction that journey is generated；Data and the instruction that flow is generated are understood for exporting gene data；For exporting gene Data calculate data and the instruction for understanding that flow is generated；Output equipment species, such as cathode-ray tube are not limited（CRT）, liquid Crystal display（LCD）, light emitting diode（LED）Deng display device, or the general purpose interface such as JTAG, USB, or The network ports such as Ethernet, LTE, Wi-Fi, Bluetooth, or analogue audio frequency output interface（As 3.5mm is stereo small by three Core interface）, digital audio output interface（Such as S/PDIF）, video output interface（Such as HDMI）Deng multimedia interface；Or solid-state Hard disk（Solid State Drives, SSD）Deng External memory equipment, the form of output data and instruction is not limited, for example electricity Signal, text, picture, voice, audio, video etc. and their any combination.Referring to Fig. 1, gene calculates unscrambling data instruction Input block 4 and gene can be based on the common equipment realization in part between calculating deciphering result output unit 5, such as general to connect Mouth mold block, Network Interface Module, External memory equipment etc..

As shown in Figure 4 and Figure 5, the detailed step bag for the instruction and data scheduling computing engines 12 of code segment being processed Include：

A3）Judge that code segment is assigned to optimize on FPGA to perform（That is executed in parallel, similarly hereinafter）Overhead less than code segment assign Whether the overhead for optimizing execution on to GPU is set up, code segment be assigned on FPGA optimize execution overhead include CPU and Interaction data and instruction are produced between FPGA communication overhead, the memory access expense of FPGA and the computing cost of FPGA, code segment It is assigned on GPU and optimizes the overhead of execution and include interaction data and instruction are produced between CPU and GPU communication overhead, GPU The computing cost of memory access expense and GPU, redirects execution step A6 if setting up）；Otherwise, execution step A4 is redirected）；

A6）The comprehensive utilization all possible accelerated methods of FPGA, accelerated method is included in parallel instructions, streamline, data parallel At least one, judge that code segment is assigned on FPGA and optimizes what the overhead of execution was performed less than code segment on CPU Whether overhead is set up, if set up, redirects execution step A9）, otherwise, redirect execution step A7）；

As shown in fig. 6, step A5）Detailed step include：

A5.2）Judge that code segment is assigned on GPU to optimize the overhead of execution and be less than the overhead that code segment is performed on CPU No establishment, code segment is assigned on GPU the overhead for optimizing execution including interaction data between CPU and GPU and instructs what is produced Communication overhead, the memory access expense of GPU and the computing cost of GPU, the overhead that code segment is performed on CPU include the visit of CPU The computing cost of expense and CPU is deposited, execution step A8 is redirected if setting up）；Otherwise, execution step A7 is redirected）.

As shown in figs. 4 and 7, the detailed step bag for the instruction and data scheduling solution read engine 13 of code segment being processed Include：

B3）Judge that code segment is assigned on DSP and optimizes the overhead of execution to be assigned on GPU and optimize less than code segment Whether the overhead of execution is set up, and code segment is assigned on DSP and optimizes the overhead of execution including being handed between CPU and DSP Communication overhead, the memory access expense of DSP and the computing cost of DSP that mutual data and instruction are produced, code segment are assigned on GPU simultaneously And the overhead of optimization execution includes communication overhead, the memory access expense of GPU that interaction data and instruction are produced between CPU and GPU And the computing cost of GPU, execution step B5 is redirected if setting up）；Otherwise, execution step B4 is redirected）；

B5）Judge that code segment is assigned on DSP and optimizes the overhead of execution less than the overhead that code segment is performed on CPU Whether set up, the overhead that code segment is performed on CPU includes the memory access expense of CPU and the computing cost of CPU, if set up Then redirect execution step B6）；Otherwise, execution step B8 is redirected）；

As shown in figure 8, step B7）Detailed step include：

In sum, the heterogeneous platform for calculating deciphering towards gene data of the present embodiment can be full with lower cost Real-time and accuracy requirement that sufficient high-performance gene data is calculated, meet accuracy that cognition gene data high understands and readable Property require.

The above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications Should be regarded as protection scope of the present invention.

Claims

It is 1. a kind of that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：Including heterogeneous processor unit（1）, mutually Connection bus module（2）, internal memory（3）, gene calculate unscrambling data instruction input unit（4）Calculated with gene and understand result output list Unit（5）, the heterogeneous processor unit（1）Pass through interconnection module respectively（2）With internal memory（3）, gene calculate unscrambling data Instruction input unit（4）, gene calculate understand result output unit（5）It is connected, the heterogeneous processor unit（1）Including CPU, GPU, DSP and FPGA, wherein CPU constitute control engine（11）, CPU, GPU, FPGA three composition computing engines（12）, institute State CPU, GPU, DSP three's Explanation in Constitution engine（13）, the control engine（11）Instructed unscrambling data is calculated by gene Input block（4）Receive gene calculating unscrambling data to instruct and be divided into code segment, when the task type of code segment is control times During business, the instruction and data of code segment is dispatched into CPU treatment；When the task type of code segment is calculating task, will generation The instruction and data scheduling computing engines of code section（12）Processed and result of calculation is calculated by gene and understood result output Unit（5）Output；When the task type of code segment is for solution reading task, by the instruction and data scheduling solution read engine of code segment （13）Processed and result of calculation is calculated by gene and understood result output unit（5）Output.
It is 2. according to claim 1 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：The FPGA bags Include cross bar switch, I/O control unit and accelerator unit, the I/O control unit, accelerator unit respectively with cross bar switch phase Connect, the accelerator unit is included for realizing that the hidden Markov model calculating that hidden Markov model computing hardware accelerates adds Fast device, for realizing at least one in both hardware-accelerated hash function computation accelerators of Hash calculation, the IO controls Unit and interconnection module（2）It is connected.
It is 3. according to claim 2 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：The IO controls Unit includes PCIE interfaces, dma controller, PIU peripheral interface units and DDR controller, and the cross bar switch is controlled with DMA respectively Device processed, PIU peripheral interface units are connected with DDR controller, are mutually interconnected between the dma controller, PIU peripheral interface units Connect, the PCIE interfaces are connected with dma controller, the PCIE interfaces, DDR controller respectively with interconnection module（2）Phase Even.
It is 4. according to claim 1 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：The interconnection is total Wire module（2）Including HCCLink bus modules（21）With HNCLink bus modules（22）, described CPU, GPU, DSP and FPGA point Tong Guo not HCCLink bus modules（21）And internal memory（3）It is connected, and described CPU, GPU, DSP and FPGA pass through HNCLink respectively Bus module（22）Unscrambling data instruction input unit is calculated with gene（4）And gene is calculated and understands result output unit（5） It is connected.
It is 5. according to claim 1 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：The gene meter Calculate unscrambling data instruction input unit（4）Connect including input equipment, common interface module, Network Interface Module, multimedia input At least one in mouth mold block, External memory equipment, sensor.
It is 6. according to claim 1 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：The gene meter Calculate and understand result output unit（5）Including display device, common interface module, Network Interface Module, multimedia output interface mould At least one in block, External memory equipment.
It is 7. according to claim 1 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：It is described by code The instruction and data scheduling computing engines of section（12）The detailed step for being processed includes：

A1）Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if line number can be entered According to executed in parallel, if three can not, redirect execution step A7）, exit；Otherwise, execution step A2 is redirected）；

A2）Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect execution Step A3）；Otherwise, execution step A6 is redirected）；

A3）Judge that code segment is assigned to optimize on FPGA to perform（That is executed in parallel, similarly hereinafter）Overhead less than code segment assign Whether the overhead for optimizing execution on to GPU is set up, the code segment be assigned on FPGA optimize execution overhead include CPU Interaction data and instruction are produced and FPGA between communication overhead, the memory access expense of FPGA and the computing cost of FPGA, it is described Code segment is assigned on GPU the overhead for optimizing execution including interaction data between CPU and GPU and instructs the communication for producing to open Pin, the memory access expense of GPU and the computing cost of GPU, redirect execution step A6 if setting up）；Otherwise, execution step is redirected A4）；

A4）Judge whether code segment is preferential energy consumption, if energy consumption is preferential, then redirect execution step A6）；Otherwise, redirect and hold Row step A5）；

A5）Judge that the gene of code segment is calculated if appropriate for GPU acceleration treatment, if being adapted to GPU acceleration treatment, redirect execution Step A8）；Otherwise, execution step A7 is redirected）；

A6）The comprehensive utilization all possible accelerated methods of FPGA, the accelerated method includes parallel instructions, streamline, data simultaneously At least one in row, judges that code segment is assigned on FPGA and optimizes the overhead of execution and is held on CPU less than code segment Whether capable overhead is set up, if set up, redirects execution step A9）, otherwise, redirect execution step A7）；

A7）The instruction and data of code segment is dispatched into CPU treatment, is exited；

A8）The instruction and data of code segment is dispatched into GPU treatment, is exited；

A9）The instruction and data of code segment is dispatched into FPGA treatment, is exited.
It is 8. according to claim 7 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：Step A5）'s Detailed step includes：

A5.1）Judge the gene of code segment calculates whether data parallel execution can be carried out, if data parallel execution can be carried out, Redirect execution step A5.2）；Otherwise, execution step A7 is redirected）；

A5.2）Judge that code segment is assigned on GPU to optimize the overhead of execution and be less than the overhead that code segment is performed on CPU No establishment, the code segment is assigned on GPU the overhead for optimizing execution including interaction data between CPU and GPU and instructs product The memory access expense and the computing cost of GPU of raw communication overhead, GPU, the overhead that the code segment is performed on CPU include The memory access expense and the computing cost of CPU of CPU, redirect execution step A8 if setting up）；Otherwise, execution step is redirected A7）.
It is 9. according to claim 1 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：It is described by code The instruction and data scheduling solution read engine of section（13）The detailed step for being processed includes：

B1）Judge whether code segment is Digital Signal Processing respectively, if be non-graphic image class multi-media processing, if be figure Shape image procossing, if three is not, redirects execution step B7）；Otherwise, execution step B2 is redirected）；

B2）Judge whether code segment is graph and image processing, if graph and image processing, then redirect execution step B3）；It is no Then, execution step B5 is redirected）；

B3）Judge that code segment is assigned on DSP and optimizes the overhead of execution to be assigned on GPU and optimize less than code segment Whether the overhead of execution is set up, the code segment be assigned on DSP and optimize execution overhead include CPU and DSP it Between interaction data and the instruction communication overhead, the memory access expense of DSP and the computing cost of DSP that produce, the code segment assigns On to GPU and optimize the overhead of execution and include interaction data and instruction are produced between CPU and GPU communication overhead, GPU The computing cost of memory access expense and GPU, redirects execution step B5 if setting up）；Otherwise, execution step B4 is redirected）；

B4）Judge whether code segment is preferential energy consumption, if energy consumption is preferential, then redirect execution step B5）；Otherwise, redirect and hold Row step B7）

B5）Judge that code segment is assigned on DSP and optimizes the overhead of execution less than the overhead that code segment is performed on CPU Whether set up, the overhead that the code segment is performed on CPU includes the memory access expense of CPU and the computing cost of CPU, if Establishment then redirects execution step B6）；Otherwise, execution step B8 is redirected）；

B6）The instruction and data of code segment is dispatched into DSP treatment, is exited；

B7）Judge that the gene of code segment is understood if appropriate for GPU acceleration treatment, if the gene of code segment is understood suitable GPU and added Speed treatment, then dispatch GPU treatment by the instruction and data of code segment, exits；Otherwise, execution step B8 is redirected）；

B8）The instruction and data of code segment is dispatched into CPU treatment, is exited.
It is 10. according to claim 9 that the heterogeneous platform understood is calculated towards gene data, it is characterised in that：Step B7）'s Detailed step includes：

B7.1）Judge whether code segment is graph and image processing, if graph and image processing, then redirect execution step B7.3）； Otherwise, execution step B7.2 is redirected）；

B7.2）Judge whether code segment can carry out data parallel execution, if data parallel execution can be carried out, redirect and perform step Rapid B7.3）；Otherwise, execution step B8 is redirected）；

B7.3）Judge that code segment is assigned on GPU and optimizes the overhead of execution always to be opened less than what code segment was performed on CPU Whether pin is set up, and the code segment is assigned to GPU on and optimizes the overhead of execution includes interaction data between CPU and GPU Communication overhead, the memory access expense of GPU and the computing cost of GPU produced with instruction, it is total that the code segment is performed on CPU Expense includes the memory access expense of CPU and the computing cost of CPU, and execution step B7.4 is redirected if setting up）；Otherwise, redirect Perform step B8）；

B7.4）The instruction and data of code segment is dispatched into GPU treatment, is exited.