[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116560733B - Space target feature on-orbit real-time parallel LU decomposition computing system and method - Google Patents

Space target feature on-orbit real-time parallel LU decomposition computing system and method Download PDF

Info

Publication number
CN116560733B
CN116560733B CN202310827007.XA CN202310827007A CN116560733B CN 116560733 B CN116560733 B CN 116560733B CN 202310827007 A CN202310827007 A CN 202310827007A CN 116560733 B CN116560733 B CN 116560733B
Authority
CN
China
Prior art keywords
result
input
adder
calculation
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310827007.XA
Other languages
Chinese (zh)
Other versions
CN116560733A (en
Inventor
禹霁阳
黄丹
卢玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ordnance Science and Research Academy of China
Beijing Institute of Spacecraft System Engineering
Original Assignee
Ordnance Science and Research Academy of China
Beijing Institute of Spacecraft System Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ordnance Science and Research Academy of China, Beijing Institute of Spacecraft System Engineering filed Critical Ordnance Science and Research Academy of China
Priority to CN202310827007.XA priority Critical patent/CN116560733B/en
Publication of CN116560733A publication Critical patent/CN116560733A/en
Application granted granted Critical
Publication of CN116560733B publication Critical patent/CN116560733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/535Dividing only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • G06F7/575Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a space target feature on-orbit real-time parallel LU decomposition computing system and a method, wherein the computing system comprises the following components: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set; the calculation method comprises the following steps: the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed; the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to the computing unit group, reads data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition and calculation to obtain an L matrix result and a U matrix result; the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register group and a U matrix register group; the LU parallel computing scheduling controller reads and outputs an L matrix result and a U matrix result. On the basis of ensuring the calculation precision and the executable frequency, the real-time LU decomposition of the space target features is realized.

Description

Space target feature on-orbit real-time parallel LU decomposition computing system and method
Technical Field
The invention relates to the technical field of space target feature calculation, in particular to an on-orbit real-time parallel LU decomposition calculation method for space target features.
Background
Currently, in the process of detecting a spatial target, in order to confirm whether a suspected target is a non-attention object such as noise or other fragments, analysis of target characteristics is required. There are two requirements in the calculation of the space target feature matrix: one is to ensure the real-time performance of calculation, and the other is to ensure the precision of calculation; conventional feature calculation is usually finished by a general floating point processor, and for large-scale suspected target elimination calculation, when the real-time requirement of the whole image calculation is in the millisecond order, the real-time performance cannot be ensured; by adopting a general fixed-point processor, the accuracy in the characteristic comparison process is difficult to ensure; the adoption of a high-performance processor such as a DSP (digital signal processor) requires additional heat dissipation design and a corresponding heat dissipation cold plate structure, and the structure is complex; the traditional statistical method is difficult to be used for feature discrimination of dozens of pixel levels, the recognition calculation amount based on CNN is large, and on-orbit application is difficult to be effectively realized. The LU decomposition can amplify the fine difference between the target and noise, so that the false alarm target can be removed rapidly.
However, conventional LU decomposition is partial block parallelism based on sparsity of input matrix values, or a pipeline process is reconstructed using a systolic structure. The designed architecture is difficult to be suitable for special complex application conditions, and meanwhile, as LU calculation elements have correlation from front to back, running water calculation is difficult to be effectively unfolded in a pulsation structure, so that the resolution calculation precision and the real-time performance of the resolution calculation are affected.
Therefore, how to improve the accuracy of the decomposition calculation on the basis of ensuring the real-time performance of the decomposition calculation is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the present invention provides a system and a method for on-orbit real-time LU decomposition and calculation of space target features, which provides a fully-pipelined parallel computing architecture for digital logic devices, designs the computing flow and time sequence of each sub-step by combing the correlation of each element in the computing process, and realizes real-time LU decomposition of space target features on the basis of ensuring computing precision and executable frequency.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an on-orbit real-time parallel LU decomposition computing system of spatial target features, comprising: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set;
the LU parallel computing scheduling controller is respectively connected with the input register set, the output L matrix register set, the output U matrix register set and the computing unit PE;
the input register set is used for receiving and storing data to be decomposed;
the calculation unit group is used for carrying out parallel running water calculation on the input data to be decomposed to obtain an L matrix result and a U matrix result;
the output L matrix register set is used for storing the L matrix result to be output;
the output U matrix register set is used for storing the U matrix result to be output;
the LU parallel computing scheduling controller is used for controlling the input register set to receive and store data to be decomposed, generating a time sequence scheduling mechanism according to the computing unit set, reading the data to be decomposed through the time sequence scheduling mechanism, sequentially transmitting the data to be decomposed to the computing unit set for LU decomposition computation, and outputting the L matrix result and the U matrix result.
Preferably, each input register in the input register group is marked as
Each output L matrix register mark in the output L matrix register group is as follows
Each output U matrix register mark in the output U matrix register group is as follows
Where i and j represent two-dimensional register row and column identifications, for an N-dimensional matrix,and->All belong to positive integers.
Preferably, the computing unit group includes t computing unitsT is a positive integer;
the computing unitComprising the following steps: multiplier->Adder->Gate->And a reciprocal calculator;
the multiplier is used for executing multiplication operation;
the adder is used for executing addition operation;
the gate is used for selecting signals according toOutputting the received signal, f=1, 2,3,4,5,6;
the reciprocal calculator is used for carrying out table look-up, multiplication, subtraction and shift operation on the input data to be decomposed.
An on-orbit real-time parallel LU decomposition calculation method for spatial target features comprises the following steps:
the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed;
the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to a computing unit group, reads the data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition computation to obtain an L matrix result and a U matrix result;
the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register set and an output U matrix register set;
and the LU parallel computing scheduling controller reads and outputs the L matrix result and the U matrix result.
Preferably, the maximum of the data to be decomposed, the L matrix result and the U matrix result is 6x6 matrix;
the computing unit group comprises t computing units for parallel computingWherein t is a positive integer of 8 or less; the computing element->Comprising the following steps: multiplier->Adder->Gate->And a reciprocal calculator, f=1, 2,3,4,5,6.
Preferably, reading the data to be decomposed by the timing schedule mechanism, and sequentially transmitting the data to be decomposed to the computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the computing unit group to complete LU decomposition computation after 35 periods;
step 2.1. Input registerDirect assignment:The process is completed in the 1 st period;representing an output U matrix register;
step 2.2. By means of a calculation unitThe reciprocal calculator in (2) calculates +.3 in cycle 3>According to>By means of a computing unit->To->Parallel computing to get->Is calculated according to the calculation result of (2);Representing an L matrix register;
step 2.3 according toBy means of a computing unit->To->Parallel computation, get +.5 in period>Is calculated according to the calculation result of (2);
step 2.4. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>By means of a computing unit->To->Parallel computation, get +.9 in period>Is calculated according to the calculation result of (2);
step 2.5. According toBy means of a computing unit->To the point ofParallel computation, get +.12 in cycle>Is calculated according to the calculation result of (2);
step 2.6. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates the value obtained in 15 th periodAccording to>By means of a computing unit->To->Parallel computation gets +.>Is calculated according to the calculation result of (2);
step 2.7. According toBy means of a computing unit->To->Parallel computation, get +.19 in 19 th cycle>Is calculated according to the calculation result of (2);
step 2.8. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>By means of a computing unit->To->Parallel computation, get +.23 in cycle>Is calculated according to the calculation result of (2);
step 2.9. According toBy means of a computing unit->To->Parallel computation, get +.27 th cycle>Is calculated according to the calculation result of (2);
step 2.10. By means of a calculation unitThe reciprocal calculator in (2) gets +.>According to>By means of a computing unit->Calculate to get +.>Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit->Calculate to get +.35 in cycle 35>Is calculated by the computer.
Preferably, a single said computing unitUp to 12 input data are processed simultaneously +.>
Preferably, the computing unitProcessing input data +.>The process of (1) comprises:
enters a reciprocal calculator and passes through three clock cycles and a gating device +.>At->The output results are multiplied, and the obtained result is then added with +.>At->The output results are added;Represents a selection signal, f=1, 2,3,4,5,6;
and->Input to multiplier->The result obtained is passed through a gating device +.>When->Time-selective input adder->When->Time-selective input adder->
And->Input to multiplier->The result obtained is input to the adder->
And->Input to multiplier->The result obtained is input to the adder->
And->Input to multiplier->The obtained result is subjected to->When->Time selection input adderWhen->Time-selective input adder->
And->Input to multiplier->The result obtained is input to the adder->
Input to a gateWhen->Time-selective input adder->When->Time-selective input adder->
Preferably, the computing unitThe process of processing the input data further comprises:
adder deviceThe output of (2) is passed through the gate->When->Time-selective input adder->When->Time-selective input adder->
Adder deviceThe output of (2) is directly to adder->
Adder deviceThe output of (2) is directly to adder->
Adder deviceThe output of (2) is passed through the gate->When->Time-selective input multiplier->When->Time-selective input adder->
Adder deviceThe output of (2) is passed through the gate->When->Output as final result->When->Time-selective input multiplier->
Preferably, the reciprocal calculator data processing procedure includes:
preprocessing the data S to be calculated, and right-shifting the decimal point to the right side of the first 1 to obtain processed dataAnd recording the shift times p;
processing the dataBinary representation is performed to obtain +.>And splitting said binary representation to obtain +.>And->And->Representation->And->Is a bit width of (2);
as address input look-up table, get the result +.>
The calculation formula of the reciprocal calculator is as follows:
wherein,,representing a binary representation operation on a variable, +.>Initially 0, & gt>Representing a look-up operation->
Compared with the prior art, the invention discloses and provides the space target feature on-orbit real-time parallel LU decomposition computing system and method, which have the following beneficial effects:
1. the hardware parallel computing system designed by the invention can be suitable for programming application of digital logic devices, the design architecture considers the maximum executable frequency in the hardware digital logic design process, the occupied resources are limited, and the stored matrix data can be subjected to 1:1 real-time LU decomposition processing.
2. The invention uses a computing unitThe gating signals in the system realize that different data are subjected to LU decomposition calculation under a unified architecture, and adapt to parallel calculation application of targets with different sizes.
3. The invention adopts fixed-point calculation, adopts a mode of combining table look-up and expansion approximation for division operation, saves hardware resources, improves calculation precision, and is suitable for application of large, medium and small digital logic devices.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a space target feature parallel LU decomposition computing system provided by the invention.
Fig. 2 is a flow chart of a space target feature parallel LU decomposition method provided by the present invention.
Fig. 3a is a timing diagram of the 1 st cycle to 14 th cycle of the timing schedule mechanism according to the present invention.
Fig. 3b is a timing diagram of 14 th cycle to 27 th cycle of the timing schedule mechanism according to the present invention.
Fig. 3c is a timing diagram of 27 th cycle to 35 th cycle of the timing schedule mechanism according to the present invention.
Fig. 4 is a schematic diagram of a computing unit PE structure and data processing according to the present invention.
Fig. 5 is a schematic diagram of a reciprocal calculator structure and data processing according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention discloses an on-orbit real-time parallel LU decomposition computing system for spatial target features, including: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set;
the LU parallel computing scheduling controller is respectively connected with the input register set, the output L matrix register set, the output U matrix register set and the computing unit PE;
the input register set is used for receiving and storing data to be decomposed;
the computing unit group is used for carrying out parallel running water computation on the input data to be decomposed to obtain an L matrix result and a U matrix result;
the output L matrix register set is used for storing an L matrix result to be output;
the output U matrix register set is used for storing the U matrix result to be output;
the LU parallel computing scheduling controller is used for controlling the input register set to receive and store data to be decomposed, generating a time sequence scheduling mechanism according to the computing unit set, reading the data to be decomposed through the time sequence scheduling mechanism, sequentially transmitting the data to be decomposed to the computing unit set for LU decomposition computation, and outputting an L matrix result and a U matrix result.
Preferably, the output L matrix register set is further configured to store temporary variables of the L matrix in the iterative process; the output U matrix register set is also used for storing temporary variables of the U matrix in the iterative process.
Preferably, the input register set is a collection of register hardware, each register being marked asThe method comprises the steps of carrying out a first treatment on the surface of the The output L matrix register set is a set of register hardware, each register is marked +.>The method comprises the steps of carrying out a first treatment on the surface of the The output U matrix register set is a collection of register hardware, each register is marked +.>The method comprises the steps of carrying out a first treatment on the surface of the Where i and j represent two-dimensional register row and column identifications, for an N-dimensional matrix,and->All belong to positive integers.
Preferably, the group of computing units comprises t computing unitsT is a positive integer;
each computing unitAll include: multiplier->Adder->Gate->And a reciprocal calculator;
the multiplier is used for executing multiplication operation;
the adder is used for executing addition operation;
the gate is used for selecting signals according toOutputting the received signal, f=1, 2,3,4,5,6;
the reciprocal calculator is used for carrying out table look-up, multiplication, subtraction and shift operation on the input data to be decomposed.
As shown in fig. 2, the embodiment of the invention discloses an on-orbit real-time parallel LU decomposition calculation method for space target features, which comprises the following steps:
the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed;
the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to the computing unit group, reads data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition and calculation to obtain an L matrix result and a U matrix result;
the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register set and an output U matrix register set;
the LU parallel computing scheduling controller reads and outputs an L matrix result and a U matrix result.
Preferably, the maximum of the data to be decomposed, the L matrix result and the U matrix result is 6x6 matrix;
the computing unit group comprises t computing units for parallel computingWherein t is a positive integer of 8 or less; computing element->Comprising the following steps: multiplier->Adder->Gate->And a reciprocal calculator, f=1, 2,3,4,5,6.
Preferably, as shown in fig. 3a-c, reading data to be decomposed by a timing scheduling mechanism, and sequentially transmitting the data to be decomposed to a computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the calculation unit group to complete LU decomposition calculation after 35 periods;
step 2.1. Input registerDirect assignment:The process is completed in the 1 st period;representing an output U matrix register;
step 2.2. By means of a calculation unitThe reciprocal calculator in (2) calculates +.3 in cycle 3>According to>By means of a computing unit->To->Parallel computing to get->Is calculated according to the calculation result of (2);Representing an L matrix register;
step 2.3 according toBy means of a computing unit->To->Parallel computation, get +.5 in period>Is calculated according to the calculation result of (2);
step 2.4. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>By means of a computing unit->To->Parallel computation, get +.9 in period>Is calculated according to the calculation result of (2);
step 2.5. According toBy means of a computing unit->To the point ofParallel computation, get +.12 in cycle>Is calculated according to the calculation result of (2);
step 2.6. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates the value obtained in 15 th periodAccording to>By means of a computing unit->To->Parallel computation gets +.>Is calculated according to the calculation result of (2);
step 2.7. According toBy means of a computing unit->To->Parallel computation, get +.19 in 19 th cycle>Is calculated according to the calculation result of (2);
step 2.8. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>By means of a computing unit->To->Parallel computation, get +.23 in cycle>Is calculated according to the calculation result of (2);
step 2.9. According toBy means of a computing unit->To->Parallel computation, get +.27 th cycle>Is calculated according to the calculation result of (2);
step 2.10. By means of a calculation unitThe reciprocal calculator in (2) gets +.>According to>By means of a computing unit->Calculate to get +.>Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit->Calculate to get +.35 in cycle 35>Calculation of (2)As a result.
Preferably, the period represents a clock period, and the time between the rising edge of one clock and the next rising edge represents one period.
Preferably, the calculation content of each clock cycle in 35 cycles is as follows:
preferably, a single computing unitUp to 12 input data are processed simultaneously +.>
Preferably, as shown in FIG. 4, the computing unitProcessing input data +.>The process of (1) comprises:
enters a reciprocal calculator and passes through three clock cycles and a gating device +.>At->The output results are multiplied, and the obtained result is then added with +.>At->The output results are added;Represents a selection signal, f=1, 2,3,4,5,6;And->Input to multiplier->The result obtained is passed through a gating device +.>When->Time-selective input adder->When->Time-selective input adder->And->Input to multiplier->The obtained result is input into an adderAnd->Input to multiplier->The result obtained is input to the adder->And->Input to multiplier->The obtained result is subjected to->When->Time-selective input adder->When->Time-selective input adder->And->Input to multiplier->The result obtained is input to the adder->Input to the gating device->When (when)Time-selective input adder->When->Time-selective input adder->
Adder deviceThe output of (2) is passed through the gate->When->Time-selective input adder->When->Time-selective input adder->The method comprises the steps of carrying out a first treatment on the surface of the Adder->The output of (2) is directly to adder->The method comprises the steps of carrying out a first treatment on the surface of the Adder->Is directly to the adderThe method comprises the steps of carrying out a first treatment on the surface of the Adder->The output of (2) is passed through the gate->When->Time-selective input multiplier->When->Time-selective input adder->The method comprises the steps of carrying out a first treatment on the surface of the Adder->The output of (2) is passed through the gate->When->Output as final result->When (when)Time-selective input multiplier->
Preferably, as shown in fig. 5, the reciprocal calculator data processing process includes:
preprocessing the data S to be calculatedThe decimal point moves to the right of the first 1 to obtain the processed dataAnd recording the shift times p;
will process the dataBinary representation is performed to obtain +.>And splitting the binary representation to obtain +.>And->And->Representation->And->Is a bit width of (2);
preferably, willBinary representation is performed to obtain +.>And splitting the binary representation to obtain +.>And->And->Representation->And->Is a bit width of (2);
as address input look-up table, get the result +.>And->Multiplication is performed with the result subtracted by 1, the subtraction result and +.>Multiplying, shifting the multiplication result left by p bits to obtain the final reciprocal calculation estimation value +.>
The calculation formula of the reciprocal calculator is as follows:
wherein,,representing a binary representation operation on a variable, +.>Initially 0, & gt>Representing a look-up operation->
Preferably, the method comprises the steps of,the larger the bit width of the table is, the higher the accuracy of the reciprocal calculation estimation value is, the more resources are occupied by table lookup, and the table lookup depth below 12-bit addresses is adopted. />
Preferably, the LU parallel computing scheduling controller transmits and stores LU decomposition computing results to the output L matrix register group and the U matrix register group respectively according to the first-come-first-served principle, and if the LU decomposition computing results arrive at the same time, the LU decomposition computing results are stored according to the number sequence number from small to large.
Preferably, the present invention discloses and provides a space target feature on-orbit real-time parallel LU decomposition computing system and method, which has the following beneficial effects compared with the prior art:
1. the hardware parallel computing system designed by the invention can be suitable for programming application of digital logic devices, the design architecture considers the maximum executable frequency in the hardware digital logic design process, the occupied resources are limited, and the stored matrix data can be subjected to 1:1 real-time LU decomposition processing.
2. The invention uses a computing unitThe gating signals in the system realize that different data are subjected to LU decomposition calculation under a unified architecture, and adapt to parallel calculation application of targets with different sizes.
3. The invention adopts fixed-point calculation, adopts a mode of combining table look-up and expansion approximation for division operation, saves hardware resources, improves calculation precision, and is suitable for application of large, medium and small digital logic devices.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. An on-orbit real-time parallel LU decomposition computing system for spatial target features, comprising: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set;
the LU parallel computing scheduling controller is respectively connected with the input register set, the output L matrix register set, the output U matrix register set and the computing unit PE;
the input register set is used for receiving and storing data to be decomposed;
the calculation unit group is used for carrying out parallel running water calculation on the input data to be decomposed to obtain an L matrix result and a U matrix result;
the output L matrix register set is used for storing the L matrix result to be output;
the output U matrix register set is used for storing the U matrix result to be output;
the LU parallel computing scheduling controller is used for controlling the input register set to receive and store data to be decomposed, generating a time sequence scheduling mechanism according to the computing unit set, reading the data to be decomposed through the time sequence scheduling mechanism, sequentially transmitting the data to be decomposed to the computing unit set for LU decomposition computation, and outputting the L matrix result and the U matrix result;
reading the data to be decomposed through the time schedule mechanism, and sequentially transmitting the data to be decomposed to the computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the computing unit group to complete LU decomposition computation after 35 periods;
step 2.1. Input register a (i,j) Direct assignment: a, a (1,i) =U (1,i) ,i∈[1,2,3,4,5,6]The process is completed in the 1 st period; u (U) (i,j) Representing an output U matrix register;
step 2.2. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 3 rd period (1,1) According to the calculation result of L at the same time (j,1) =a (j,1) (1/U (1,1) ),j∈[2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing to obtain L (j,1) Is calculated according to the calculation result of (2); l (L) (i,j) Representing an L matrix register;
step 2.3. According to U (2,i) =a (2,i) -L (2,1) U (1,i) ,i∈[1,2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing, obtaining U in the 5 th period (2,i) Is calculated according to the calculation result of (2);
step 2.4. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 8 th period (2,2) According to the calculation result of L at the same time (j,2) =a (j,2) -(L (j,1) U (1,2) /U (2,2) ),j∈[3,4,5,6]By means of a calculation unit PE 2 To PE 5 : parallel computing, obtaining L in 9 th period (j,2) Is calculated according to the calculation result of (2);
step 2.5. According toBy means of a computing unit PE 1 To PE 4 Parallel computing, obtaining U in 12 th period (3,i) Is calculated according to the calculation result of (2);
step 2.6. By calculationUnit PE 1 The reciprocal calculator in (2) calculates 1/U at 15 th period (3,3) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 4 Parallel computation to get L at 16 th cycle (j,3) Is calculated according to the calculation result of (2);
step 2.7. According toBy means of a computing unit PE 1 To PE 3 Parallel computing, obtaining U in 19 th period (4,i) Is calculated according to the calculation result of (2);
step 2.8. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at 22 th period (4,4) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 3 Parallel computing, obtaining L in 23 rd period (j,4) Is calculated according to the calculation result of (2);
step 2.9. According toBy means of a computing unit PE 1 To PE 2 Parallel computing, obtaining U in 27 th period (5,i) Is calculated according to the calculation result of (2);
step 2.10. By means of the calculation unit PE 1 The reciprocal calculator in (2) obtains 1/U in 30 th period (5,5) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 Calculation to obtain L at cycle 32 (6,5) Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit PE 1 Calculate U at 35 th cycle (6,6) Is calculated according to the calculation result of (2);
calculation unit PE t Processing input data m k The process of (1) comprises:
m 0 enters a reciprocal calculator and passes through three clock cycles and a gating device MUX 5 At Sel 5 Multiplying the output result when=0, and then the obtained result is multiplied by MUX 6 At Sel 6 Output results when=0 are added; sel (Sel) f Represents a selection signal, f=1, 2,3,4,5,6;
m 1 and m is equal to 2 Input to multiplier M 1 The result obtained is passed through a gate MUX 1 When Sel 1 Select input adder a when=0 5 When Sel 1 Select input adder a when=1 1
m 3 And m is equal to 4 Input to multiplier M 2 The obtained result is input into an adder A 1
m 5 And m is equal to 6 Input to multiplier M 3 The obtained result is input into an adder A 2
m 7 And m is equal to 8 Input to multiplier M 4 The result obtained is passed through MUX 3 When Sel 2 Select input adder a when=0 2 When Sel 2 Select input adder a when=1 5
m 9 And m is equal to 10 Input to multiplier M 5 The obtained result is input into an adder A 3
m 11 Input to a gate MUX 6 When Sel 6 Select input adder a when=0 6 When Sel 6 Select input adder a when=1 3
The process of processing the input data by the computing unit PEt further comprises:
adder A 1 The output of (a) passes through a gate MUX 2 When Sel 3 Select input adder a when=0 5 When Sel 3 Select input adder a when=1 4
Adder A 2 The output of the adder is directly connected to the adder A 4
Adder A 3 The output of the adder is directly connected to the adder A 5
Adder A 4 The output of (a) passes through a gate MUX 4 When Sel 4 Select input multiplier M when=0 6 When Sel 4 Select input adder a when=1 5
Adder A 5 The output of (a) passes through a gate MUX 5 When Sel 5 Output as final result n when=0 0 When Sel 5 Select input multiplier M when=1 6
The reciprocal calculator data processing process includes:
preprocessing the data S to be calculated, and right-shifting the decimal point to the right side of the first 1 to obtain processed data f trans (S) and recording the shift number p;
processing the data f trans (S) performing binary representation to obtain [ S ] M-1 ,s M-2 ,…,s N ,s N-1 ,…,s 0 ] 2 And split the binary representation to obtain x S =[s M-1 ,s M-2 ,…,s N ] 2 And D S =[s N-1 ,s N-2 ,…,s 0 ] 2 The method comprises the steps of carrying out a first treatment on the surface of the M and N represent x S And D S Is a bit width of (2);
x S as an address input lookup table to obtain the result f Table (x S );
The calculation formula of the reciprocal calculator is as follows:
wherein [ (S)] 2 Representing binary representation operations on variables, M.gtoreq.N, p is initially 0, f Table (. Cndot.) means a look-up table operation,
2. the spatial target feature on-orbit real-time parallel LU decomposition calculation system according to claim 1, wherein each input register in the input register set is labeled a (i,j)
Each output L matrix register mark in the output L matrix register group is L (i,j)
Each output U matrix register mark in the output U matrix register group is U (i,j)
Wherein i and j represent two-dimensional register row and column identifications, and for an N-dimensional matrix, i, j E [1, N ] and i, j belong to positive integers.
3. The on-orbit real-time parallel LU decomposition computing system according to claim 1, wherein the computing unit group includes t computing units PE t T is a positive integer;
the computing unit PE t Comprising the following steps: multiplier M f Adder A f Gate MUX f And a reciprocal calculator;
the multiplier is used for executing multiplication operation;
the adder is used for executing addition operation;
the gate is used for selecting the signal Sel according to f Outputting the received signal, f=1, 2,3,4,5,6;
the reciprocal calculator is used for carrying out table look-up, multiplication, subtraction and shift operation on the input data to be decomposed.
4. The on-orbit real-time parallel LU decomposition calculation method for the space target features is characterized by comprising the following steps of:
the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed;
the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to a computing unit group, reads the data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition computation to obtain an L matrix result and a U matrix result;
the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register group and a U matrix register group;
the LU parallel computing scheduling controller reads and outputs the L matrix result and the output U matrix result;
reading the data to be decomposed through the time schedule mechanism, and sequentially transmitting the data to be decomposed to the computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the computing unit group to complete LU decomposition computation after 35 periods;
step 2.1. Input register a (i,j) Direct assignment: a, a (1,i) =U (1,i) ,i∈[1,2,3,4,5,6]The process is completed in the 1 st period; u (U) (i,j) Representing an output U matrix register;
step 2.2. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 3 rd period (1,1) According to the calculation result of L at the same time (j,1) =a (j,1) (1/U (1,1) ),j∈[2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing to obtain L (j,1) Is calculated according to the calculation result of (2); l (L) (i,j) Representing an L matrix register;
step 2.3. According to U (2,i) =a (2,i) -L (2,1) U (1,i) ,i∈[1,2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing, obtaining U in the 5 th period (2,i) Is calculated according to the calculation result of (2);
step 2.4. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 8 th period (2,2) According to the calculation result of L at the same time (j,2) =a (j,2) -(L (j,1) U (1,2) /U (2,2) ),j∈[3,4,5,6]By means of a calculation unit PE 2 To PE 5 : parallel computing, obtaining L in 9 th period (j,2) Is calculated according to the calculation result of (2);
step 2.5. According toBy means of a computing unit PE 1 To PE 4 Parallel computing, obtaining U in 12 th period (3,i) Is calculated according to the calculation result of (2);
step 2.6. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at 15 th period (3,3) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 4 Parallel computation to get L at 16 th cycle (j,3) Is calculated according to the calculation result of (2);
step 2.7. According toBy means of a computing unit PE 1 To PE 3 Parallel computing, obtaining U in 19 th period (4,i) Is calculated according to the calculation result of (2);
step 2.8. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at 22 th period (4,4) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 3 Parallel computing, obtaining L in 23 rd period (j,4) Is calculated according to the calculation result of (2);
step 2.9. According toBy means of a computing unit PE 1 To PE 2 Parallel computing, obtaining U in 27 th period (5,i) Is calculated according to the calculation result of (2);
step 2.10. By means of the calculation unit PE 1 The reciprocal calculator in (2) obtains 1/U in 30 th period (5,5) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 Calculation to obtain L at cycle 32 (6,5) Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit PE 1 Calculate U at 35 th cycle (6,6) Is calculated according to the calculation result of (2);
the computing unit PEt processes the input data m k The process of (1) comprises:
m 0 enters a reciprocal calculator and passes through three clock cycles and a gating device MUX 5 At Sel 5 Multiplying the output result when=0, and then the obtained result is multiplied by MUX 6 At Sel 6 Output results when=0 are added; sel (Sel) f Represents a selection signal, f=1, 2,3,4,5,6;
m 1 and m is equal to 2 Input to multiplier M 1 The result obtained is passed through a gate MUX 1 When Sel 1 Select input adder a when=0 5 When Sel 1 Select input adder a when=1 1
m 3 And m is equal to 4 Input to multiplier M 2 The obtained result is input into an adder A 1
m 5 And m is equal to 6 Input to multiplier M 3 The obtained result is input into an adder A 2
m 7 And m is equal to 8 Input to multiplier M 4 The result obtained is passed through MUX 3 When Sel 2 Select input adder a when=0 2 When Sel 2 Select input adder a when=1 5
m 9 And m is equal to 10 Input to multiplier M 5 The obtained result is input into an adder A 3
m 11 Input to a gate MUX 6 When Sel 6 Select input adder a when=0 6 When Sel 6 Select input adder a when=1 3
The process of processing the input data by the computing unit PEt further comprises:
adder A 1 The output of (a) passes through a gate MUX 2 When Sel 3 Select input adder a when=0 5 When Sel 3 Select input adder a when=1 4
Adder A 2 The output of the adder is directly connected to the adder A 4
Adder A 3 The output of the adder is directly connected to the adder A 5
Adder A 4 The output of (a) passes through a gate MUX 4 When Sel 4 Select input multiplier M when=0 6 When Sel 4 Select input adder a when=1 5
Adder A 5 The output of (a) passes through a gate MUX 5 When Sel 5 Output as final result n when=0 0 When Sel 5 Select input multiplier M when=1 6
The reciprocal calculator data processing process includes:
preprocessing the data S to be calculated, and right-shifting the decimal point to the right side of the first 1 to obtain processed data f trans (S) and recording the shift number p;
processing the data f trans (S) performing binary representation to obtain [ S ] M-1 ,s M-2 ,…,s N ,s N-1 ,…,s 0 ] 2 And split the binary representation to obtain x S =[s M-1 ,s M-2 ,…,s N ] 2 And D S =[s N-1 ,s N-2 ,…,s 0 ] 2 The method comprises the steps of carrying out a first treatment on the surface of the M and N represent x S And D S Is a bit width of (2);
x S as an address input lookup table to obtain the result f Table (x S );
The calculation formula of the reciprocal calculator is as follows:
wherein [ (S)] 2 Representing binary representation operations on variables, M.gtoreq.N, P initially being 0, f Table (. Cndot.) means a look-up table operation,
5. the on-orbit real-time parallel LU decomposition calculation method of space target features according to claim 4, wherein the maximum of the data to be decomposed, the L matrix result and the U matrix result is a 6x6 matrix;
the computing unit group comprises t computing units PEt for parallel computing, wherein t is a positive integer less than or equal to 8; the computing unit PE t Comprising the following steps: multiplier M f Adder A f Gate MUX f And a reciprocal calculator, f=1, 2,3,4,5,6.
6. The on-orbit real-time parallel LU decomposition calculation method of space object features according to claim 4, wherein a single calculation unit PEt processes at most 12 input data m simultaneously k ,K∈[0,1,…,11]。
CN202310827007.XA 2023-07-07 2023-07-07 Space target feature on-orbit real-time parallel LU decomposition computing system and method Active CN116560733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310827007.XA CN116560733B (en) 2023-07-07 2023-07-07 Space target feature on-orbit real-time parallel LU decomposition computing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310827007.XA CN116560733B (en) 2023-07-07 2023-07-07 Space target feature on-orbit real-time parallel LU decomposition computing system and method

Publications (2)

Publication Number Publication Date
CN116560733A CN116560733A (en) 2023-08-08
CN116560733B true CN116560733B (en) 2023-10-24

Family

ID=87502175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310827007.XA Active CN116560733B (en) 2023-07-07 2023-07-07 Space target feature on-orbit real-time parallel LU decomposition computing system and method

Country Status (1)

Country Link
CN (1) CN116560733B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004079585A1 (en) * 2003-03-07 2004-09-16 Matsushita Electric Industrial Co., Ltd. Matrix operating device
CN107341133A (en) * 2017-06-24 2017-11-10 中国人民解放军信息工程大学 The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition
CN110457648A (en) * 2019-07-30 2019-11-15 暨南大学 A kind of implementation method of the systolic array architecture decomposed for LU
CN114996649A (en) * 2022-05-09 2022-09-02 深圳市国微电子有限公司 Method for realizing matrix decomposition and lower triangular matrix inversion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004079585A1 (en) * 2003-03-07 2004-09-16 Matsushita Electric Industrial Co., Ltd. Matrix operating device
CN107341133A (en) * 2017-06-24 2017-11-10 中国人民解放军信息工程大学 The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition
CN110457648A (en) * 2019-07-30 2019-11-15 暨南大学 A kind of implementation method of the systolic array architecture decomposed for LU
CN114996649A (en) * 2022-05-09 2022-09-02 深圳市国微电子有限公司 Method for realizing matrix decomposition and lower triangular matrix inversion

Also Published As

Publication number Publication date
CN116560733A (en) 2023-08-08

Similar Documents

Publication Publication Date Title
US11698773B2 (en) Accelerated mathematical engine
CN110163360B (en) Computing device and method
CN115039067A (en) Systolic array including fused multiply accumulate with efficient pre-normalization and extended dynamic range
EP0275979A2 (en) Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples
JP3228927B2 (en) Processor element, processing unit, processor, and arithmetic processing method thereof
CN110766128A (en) Convolution calculation unit, calculation method and neural network calculation platform
CN116560733B (en) Space target feature on-orbit real-time parallel LU decomposition computing system and method
CN101426134A (en) Hardware device and method for video encoding and decoding
CN108762719B (en) Parallel generalized inner product reconstruction controller
Mohanty et al. Design and performance analysis of fixed-point jacobi svd algorithm on reconfigurable system
CN106775579A (en) Floating-point operation accelerator module based on configurable technology
CN116888591A (en) Matrix multiplier, matrix calculation method and related equipment
CN110674456B (en) Time-frequency conversion method of signal acquisition system
US10127040B2 (en) Processor and method for executing memory access and computing instructions for host matrix operations
CN116578819A (en) Sparse fraction Fourier transform FPGA implementation method and system
Sankaran et al. Design and Implementation of 1024 Point Pipelined Radix 4 FFT Processor on FPGA for Biomedical Signal Processing Applications
Chaudhari et al. An optimized approach to pipelined architecture for fast 2D normalized cross-correlation
Ghaffari et al. A fully pipelined FPGA architecture for multiscale BRISK descriptors with a novel hardware-aware sampling pattern
Khamaneh et al. Real–time memory efficient SLIC accelerator for low–power applications
JPH0766372B2 (en) Floating point processor
Hsiao et al. Design of a low-cost floating-point programmable vertex processor for mobile graphics applications based on hybrid number system
Chen et al. Edge FPGA-based onsite neural network training
Chen et al. A high-throughput and area-efficient video transform core with a time division strategy
TWI564735B (en) Data allocating apparatus, signal processing apparatus, and data allocating method
CN114900703B (en) Universal video coding system and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant