CN116560733B - Space target feature on-orbit real-time parallel LU decomposition computing system and method - Google Patents
Space target feature on-orbit real-time parallel LU decomposition computing system and method Download PDFInfo
- Publication number
- CN116560733B CN116560733B CN202310827007.XA CN202310827007A CN116560733B CN 116560733 B CN116560733 B CN 116560733B CN 202310827007 A CN202310827007 A CN 202310827007A CN 116560733 B CN116560733 B CN 116560733B
- Authority
- CN
- China
- Prior art keywords
- result
- input
- adder
- calculation
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004364 calculation method Methods 0.000 claims abstract description 125
- 239000011159 matrix material Substances 0.000 claims abstract description 110
- 238000007781 pre-processing Methods 0.000 claims description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012938 design process Methods 0.000 description 2
- 230000017525 heat dissipation Effects 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000010349 pulsation Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/535—Dividing only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
- G06F7/575—Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a space target feature on-orbit real-time parallel LU decomposition computing system and a method, wherein the computing system comprises the following components: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set; the calculation method comprises the following steps: the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed; the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to the computing unit group, reads data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition and calculation to obtain an L matrix result and a U matrix result; the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register group and a U matrix register group; the LU parallel computing scheduling controller reads and outputs an L matrix result and a U matrix result. On the basis of ensuring the calculation precision and the executable frequency, the real-time LU decomposition of the space target features is realized.
Description
Technical Field
The invention relates to the technical field of space target feature calculation, in particular to an on-orbit real-time parallel LU decomposition calculation method for space target features.
Background
Currently, in the process of detecting a spatial target, in order to confirm whether a suspected target is a non-attention object such as noise or other fragments, analysis of target characteristics is required. There are two requirements in the calculation of the space target feature matrix: one is to ensure the real-time performance of calculation, and the other is to ensure the precision of calculation; conventional feature calculation is usually finished by a general floating point processor, and for large-scale suspected target elimination calculation, when the real-time requirement of the whole image calculation is in the millisecond order, the real-time performance cannot be ensured; by adopting a general fixed-point processor, the accuracy in the characteristic comparison process is difficult to ensure; the adoption of a high-performance processor such as a DSP (digital signal processor) requires additional heat dissipation design and a corresponding heat dissipation cold plate structure, and the structure is complex; the traditional statistical method is difficult to be used for feature discrimination of dozens of pixel levels, the recognition calculation amount based on CNN is large, and on-orbit application is difficult to be effectively realized. The LU decomposition can amplify the fine difference between the target and noise, so that the false alarm target can be removed rapidly.
However, conventional LU decomposition is partial block parallelism based on sparsity of input matrix values, or a pipeline process is reconstructed using a systolic structure. The designed architecture is difficult to be suitable for special complex application conditions, and meanwhile, as LU calculation elements have correlation from front to back, running water calculation is difficult to be effectively unfolded in a pulsation structure, so that the resolution calculation precision and the real-time performance of the resolution calculation are affected.
Therefore, how to improve the accuracy of the decomposition calculation on the basis of ensuring the real-time performance of the decomposition calculation is a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the present invention provides a system and a method for on-orbit real-time LU decomposition and calculation of space target features, which provides a fully-pipelined parallel computing architecture for digital logic devices, designs the computing flow and time sequence of each sub-step by combing the correlation of each element in the computing process, and realizes real-time LU decomposition of space target features on the basis of ensuring computing precision and executable frequency.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an on-orbit real-time parallel LU decomposition computing system of spatial target features, comprising: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set;
the LU parallel computing scheduling controller is respectively connected with the input register set, the output L matrix register set, the output U matrix register set and the computing unit PE;
the input register set is used for receiving and storing data to be decomposed;
the calculation unit group is used for carrying out parallel running water calculation on the input data to be decomposed to obtain an L matrix result and a U matrix result;
the output L matrix register set is used for storing the L matrix result to be output;
the output U matrix register set is used for storing the U matrix result to be output;
the LU parallel computing scheduling controller is used for controlling the input register set to receive and store data to be decomposed, generating a time sequence scheduling mechanism according to the computing unit set, reading the data to be decomposed through the time sequence scheduling mechanism, sequentially transmitting the data to be decomposed to the computing unit set for LU decomposition computation, and outputting the L matrix result and the U matrix result.
Preferably, each input register in the input register group is marked as;
Each output L matrix register mark in the output L matrix register group is as follows;
Each output U matrix register mark in the output U matrix register group is as follows;
Where i and j represent two-dimensional register row and column identifications, for an N-dimensional matrix,and->All belong to positive integers.
Preferably, the computing unit group includes t computing unitsT is a positive integer;
the computing unitComprising the following steps: multiplier->Adder->Gate->And a reciprocal calculator;
the multiplier is used for executing multiplication operation;
the adder is used for executing addition operation;
the gate is used for selecting signals according toOutputting the received signal, f=1, 2,3,4,5,6;
the reciprocal calculator is used for carrying out table look-up, multiplication, subtraction and shift operation on the input data to be decomposed.
An on-orbit real-time parallel LU decomposition calculation method for spatial target features comprises the following steps:
the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed;
the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to a computing unit group, reads the data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition computation to obtain an L matrix result and a U matrix result;
the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register set and an output U matrix register set;
and the LU parallel computing scheduling controller reads and outputs the L matrix result and the U matrix result.
Preferably, the maximum of the data to be decomposed, the L matrix result and the U matrix result is 6x6 matrix;
the computing unit group comprises t computing units for parallel computingWherein t is a positive integer of 8 or less; the computing element->Comprising the following steps: multiplier->Adder->Gate->And a reciprocal calculator, f=1, 2,3,4,5,6.
Preferably, reading the data to be decomposed by the timing schedule mechanism, and sequentially transmitting the data to be decomposed to the computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the computing unit group to complete LU decomposition computation after 35 periods;
step 2.1. Input registerDirect assignment:,The process is completed in the 1 st period;representing an output U matrix register;
step 2.2. By means of a calculation unitThe reciprocal calculator in (2) calculates +.3 in cycle 3>According to>,By means of a computing unit->To->Parallel computing to get->Is calculated according to the calculation result of (2);Representing an L matrix register;
step 2.3 according to,By means of a computing unit->To->Parallel computation, get +.5 in period>Is calculated according to the calculation result of (2);
step 2.4. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>,By means of a computing unit->To->Parallel computation, get +.9 in period>Is calculated according to the calculation result of (2);
step 2.5. According to,By means of a computing unit->To the point ofParallel computation, get +.12 in cycle>Is calculated according to the calculation result of (2);
step 2.6. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates the value obtained in 15 th periodAccording to>,By means of a computing unit->To->Parallel computation gets +.>Is calculated according to the calculation result of (2);
step 2.7. According to,By means of a computing unit->To->Parallel computation, get +.19 in 19 th cycle>Is calculated according to the calculation result of (2);
step 2.8. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>,By means of a computing unit->To->Parallel computation, get +.23 in cycle>Is calculated according to the calculation result of (2);
step 2.9. According to,By means of a computing unit->To->Parallel computation, get +.27 th cycle>Is calculated according to the calculation result of (2);
step 2.10. By means of a calculation unitThe reciprocal calculator in (2) gets +.>According to>By means of a computing unit->Calculate to get +.>Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit->Calculate to get +.35 in cycle 35>Is calculated by the computer.
Preferably, a single said computing unitUp to 12 input data are processed simultaneously +.>,。
Preferably, the computing unitProcessing input data +.>The process of (1) comprises:
enters a reciprocal calculator and passes through three clock cycles and a gating device +.>At->The output results are multiplied, and the obtained result is then added with +.>At->The output results are added;Represents a selection signal, f=1, 2,3,4,5,6;
and->Input to multiplier->The result obtained is passed through a gating device +.>When->Time-selective input adder->When->Time-selective input adder->;
And->Input to multiplier->The result obtained is input to the adder->;
And->Input to multiplier->The result obtained is input to the adder->;
And->Input to multiplier->The obtained result is subjected to->When->Time selection input adderWhen->Time-selective input adder->;
And->Input to multiplier->The result obtained is input to the adder->;
Input to a gateWhen->Time-selective input adder->When->Time-selective input adder->。
Preferably, the computing unitThe process of processing the input data further comprises:
adder deviceThe output of (2) is passed through the gate->When->Time-selective input adder->When->Time-selective input adder->;
Adder deviceThe output of (2) is directly to adder->;
Adder deviceThe output of (2) is directly to adder->;
Adder deviceThe output of (2) is passed through the gate->When->Time-selective input multiplier->When->Time-selective input adder->;
Adder deviceThe output of (2) is passed through the gate->When->Output as final result->When->Time-selective input multiplier->。
Preferably, the reciprocal calculator data processing procedure includes:
preprocessing the data S to be calculated, and right-shifting the decimal point to the right side of the first 1 to obtain processed dataAnd recording the shift times p;
processing the dataBinary representation is performed to obtain +.>And splitting said binary representation to obtain +.>And->;And->Representation->And->Is a bit width of (2);
as address input look-up table, get the result +.>;
The calculation formula of the reciprocal calculator is as follows:
;
wherein,,representing a binary representation operation on a variable, +.>,Initially 0, & gt>Representing a look-up operation->。
Compared with the prior art, the invention discloses and provides the space target feature on-orbit real-time parallel LU decomposition computing system and method, which have the following beneficial effects:
1. the hardware parallel computing system designed by the invention can be suitable for programming application of digital logic devices, the design architecture considers the maximum executable frequency in the hardware digital logic design process, the occupied resources are limited, and the stored matrix data can be subjected to 1:1 real-time LU decomposition processing.
2. The invention uses a computing unitThe gating signals in the system realize that different data are subjected to LU decomposition calculation under a unified architecture, and adapt to parallel calculation application of targets with different sizes.
3. The invention adopts fixed-point calculation, adopts a mode of combining table look-up and expansion approximation for division operation, saves hardware resources, improves calculation precision, and is suitable for application of large, medium and small digital logic devices.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a space target feature parallel LU decomposition computing system provided by the invention.
Fig. 2 is a flow chart of a space target feature parallel LU decomposition method provided by the present invention.
Fig. 3a is a timing diagram of the 1 st cycle to 14 th cycle of the timing schedule mechanism according to the present invention.
Fig. 3b is a timing diagram of 14 th cycle to 27 th cycle of the timing schedule mechanism according to the present invention.
Fig. 3c is a timing diagram of 27 th cycle to 35 th cycle of the timing schedule mechanism according to the present invention.
Fig. 4 is a schematic diagram of a computing unit PE structure and data processing according to the present invention.
Fig. 5 is a schematic diagram of a reciprocal calculator structure and data processing according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention discloses an on-orbit real-time parallel LU decomposition computing system for spatial target features, including: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set;
the LU parallel computing scheduling controller is respectively connected with the input register set, the output L matrix register set, the output U matrix register set and the computing unit PE;
the input register set is used for receiving and storing data to be decomposed;
the computing unit group is used for carrying out parallel running water computation on the input data to be decomposed to obtain an L matrix result and a U matrix result;
the output L matrix register set is used for storing an L matrix result to be output;
the output U matrix register set is used for storing the U matrix result to be output;
the LU parallel computing scheduling controller is used for controlling the input register set to receive and store data to be decomposed, generating a time sequence scheduling mechanism according to the computing unit set, reading the data to be decomposed through the time sequence scheduling mechanism, sequentially transmitting the data to be decomposed to the computing unit set for LU decomposition computation, and outputting an L matrix result and a U matrix result.
Preferably, the output L matrix register set is further configured to store temporary variables of the L matrix in the iterative process; the output U matrix register set is also used for storing temporary variables of the U matrix in the iterative process.
Preferably, the input register set is a collection of register hardware, each register being marked asThe method comprises the steps of carrying out a first treatment on the surface of the The output L matrix register set is a set of register hardware, each register is marked +.>The method comprises the steps of carrying out a first treatment on the surface of the The output U matrix register set is a collection of register hardware, each register is marked +.>The method comprises the steps of carrying out a first treatment on the surface of the Where i and j represent two-dimensional register row and column identifications, for an N-dimensional matrix,and->All belong to positive integers.
Preferably, the group of computing units comprises t computing unitsT is a positive integer;
each computing unitAll include: multiplier->Adder->Gate->And a reciprocal calculator;
the multiplier is used for executing multiplication operation;
the adder is used for executing addition operation;
the gate is used for selecting signals according toOutputting the received signal, f=1, 2,3,4,5,6;
the reciprocal calculator is used for carrying out table look-up, multiplication, subtraction and shift operation on the input data to be decomposed.
As shown in fig. 2, the embodiment of the invention discloses an on-orbit real-time parallel LU decomposition calculation method for space target features, which comprises the following steps:
the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed;
the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to the computing unit group, reads data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition and calculation to obtain an L matrix result and a U matrix result;
the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register set and an output U matrix register set;
the LU parallel computing scheduling controller reads and outputs an L matrix result and a U matrix result.
Preferably, the maximum of the data to be decomposed, the L matrix result and the U matrix result is 6x6 matrix;
the computing unit group comprises t computing units for parallel computingWherein t is a positive integer of 8 or less; computing element->Comprising the following steps: multiplier->Adder->Gate->And a reciprocal calculator, f=1, 2,3,4,5,6.
Preferably, as shown in fig. 3a-c, reading data to be decomposed by a timing scheduling mechanism, and sequentially transmitting the data to be decomposed to a computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the calculation unit group to complete LU decomposition calculation after 35 periods;
step 2.1. Input registerDirect assignment:,The process is completed in the 1 st period;representing an output U matrix register;
step 2.2. By means of a calculation unitThe reciprocal calculator in (2) calculates +.3 in cycle 3>According to>,By means of a computing unit->To->Parallel computing to get->Is calculated according to the calculation result of (2);Representing an L matrix register;
step 2.3 according to,By means of a computing unit->To->Parallel computation, get +.5 in period>Is calculated according to the calculation result of (2);
step 2.4. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>,By means of a computing unit->To->Parallel computation, get +.9 in period>Is calculated according to the calculation result of (2);
step 2.5. According to,By means of a computing unit->To the point ofParallel computation, get +.12 in cycle>Is calculated according to the calculation result of (2);
step 2.6. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates the value obtained in 15 th periodAccording to>,By means of a computing unit->To->Parallel computation gets +.>Is calculated according to the calculation result of (2);
step 2.7. According to,By means of a computing unit->To->Parallel computation, get +.19 in 19 th cycle>Is calculated according to the calculation result of (2);
step 2.8. By means of a calculation unitThe reciprocal calculator in (2) calculates +.>According to>,By means of a computing unit->To->Parallel computation, get +.23 in cycle>Is calculated according to the calculation result of (2);
step 2.9. According to,By means of a computing unit->To->Parallel computation, get +.27 th cycle>Is calculated according to the calculation result of (2);
step 2.10. By means of a calculation unitThe reciprocal calculator in (2) gets +.>According to>By means of a computing unit->Calculate to get +.>Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit->Calculate to get +.35 in cycle 35>Calculation of (2)As a result.
Preferably, the period represents a clock period, and the time between the rising edge of one clock and the next rising edge represents one period.
Preferably, the calculation content of each clock cycle in 35 cycles is as follows:
preferably, a single computing unitUp to 12 input data are processed simultaneously +.>,。
Preferably, as shown in FIG. 4, the computing unitProcessing input data +.>The process of (1) comprises:
enters a reciprocal calculator and passes through three clock cycles and a gating device +.>At->The output results are multiplied, and the obtained result is then added with +.>At->The output results are added;Represents a selection signal, f=1, 2,3,4,5,6;And->Input to multiplier->The result obtained is passed through a gating device +.>When->Time-selective input adder->When->Time-selective input adder->;And->Input to multiplier->The obtained result is input into an adder;And->Input to multiplier->The result obtained is input to the adder->;And->Input to multiplier->The obtained result is subjected to->When->Time-selective input adder->When->Time-selective input adder->;And->Input to multiplier->The result obtained is input to the adder->;Input to the gating device->When (when)Time-selective input adder->When->Time-selective input adder->。
Adder deviceThe output of (2) is passed through the gate->When->Time-selective input adder->When->Time-selective input adder->The method comprises the steps of carrying out a first treatment on the surface of the Adder->The output of (2) is directly to adder->The method comprises the steps of carrying out a first treatment on the surface of the Adder->Is directly to the adderThe method comprises the steps of carrying out a first treatment on the surface of the Adder->The output of (2) is passed through the gate->When->Time-selective input multiplier->When->Time-selective input adder->The method comprises the steps of carrying out a first treatment on the surface of the Adder->The output of (2) is passed through the gate->When->Output as final result->When (when)Time-selective input multiplier->。
Preferably, as shown in fig. 5, the reciprocal calculator data processing process includes:
preprocessing the data S to be calculatedThe decimal point moves to the right of the first 1 to obtain the processed dataAnd recording the shift times p;
will process the dataBinary representation is performed to obtain +.>And splitting the binary representation to obtain +.>And->;And->Representation->And->Is a bit width of (2);
preferably, willBinary representation is performed to obtain +.>And splitting the binary representation to obtain +.>And->;And->Representation->And->Is a bit width of (2);
as address input look-up table, get the result +.>;And->Multiplication is performed with the result subtracted by 1, the subtraction result and +.>Multiplying, shifting the multiplication result left by p bits to obtain the final reciprocal calculation estimation value +.>;
The calculation formula of the reciprocal calculator is as follows:
;
wherein,,representing a binary representation operation on a variable, +.>,Initially 0, & gt>Representing a look-up operation->。
Preferably, the method comprises the steps of,the larger the bit width of the table is, the higher the accuracy of the reciprocal calculation estimation value is, the more resources are occupied by table lookup, and the table lookup depth below 12-bit addresses is adopted. />
Preferably, the LU parallel computing scheduling controller transmits and stores LU decomposition computing results to the output L matrix register group and the U matrix register group respectively according to the first-come-first-served principle, and if the LU decomposition computing results arrive at the same time, the LU decomposition computing results are stored according to the number sequence number from small to large.
Preferably, the present invention discloses and provides a space target feature on-orbit real-time parallel LU decomposition computing system and method, which has the following beneficial effects compared with the prior art:
1. the hardware parallel computing system designed by the invention can be suitable for programming application of digital logic devices, the design architecture considers the maximum executable frequency in the hardware digital logic design process, the occupied resources are limited, and the stored matrix data can be subjected to 1:1 real-time LU decomposition processing.
2. The invention uses a computing unitThe gating signals in the system realize that different data are subjected to LU decomposition calculation under a unified architecture, and adapt to parallel calculation application of targets with different sizes.
3. The invention adopts fixed-point calculation, adopts a mode of combining table look-up and expansion approximation for division operation, saves hardware resources, improves calculation precision, and is suitable for application of large, medium and small digital logic devices.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
1. An on-orbit real-time parallel LU decomposition computing system for spatial target features, comprising: the system comprises an input register set, an output L matrix register set, an output U matrix register set, an LU parallel computing scheduling controller and a computing unit set;
the LU parallel computing scheduling controller is respectively connected with the input register set, the output L matrix register set, the output U matrix register set and the computing unit PE;
the input register set is used for receiving and storing data to be decomposed;
the calculation unit group is used for carrying out parallel running water calculation on the input data to be decomposed to obtain an L matrix result and a U matrix result;
the output L matrix register set is used for storing the L matrix result to be output;
the output U matrix register set is used for storing the U matrix result to be output;
the LU parallel computing scheduling controller is used for controlling the input register set to receive and store data to be decomposed, generating a time sequence scheduling mechanism according to the computing unit set, reading the data to be decomposed through the time sequence scheduling mechanism, sequentially transmitting the data to be decomposed to the computing unit set for LU decomposition computation, and outputting the L matrix result and the U matrix result;
reading the data to be decomposed through the time schedule mechanism, and sequentially transmitting the data to be decomposed to the computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the computing unit group to complete LU decomposition computation after 35 periods;
step 2.1. Input register a (i,j) Direct assignment: a, a (1,i) =U (1,i) ,i∈[1,2,3,4,5,6]The process is completed in the 1 st period; u (U) (i,j) Representing an output U matrix register;
step 2.2. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 3 rd period (1,1) According to the calculation result of L at the same time (j,1) =a (j,1) (1/U (1,1) ),j∈[2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing to obtain L (j,1) Is calculated according to the calculation result of (2); l (L) (i,j) Representing an L matrix register;
step 2.3. According to U (2,i) =a (2,i) -L (2,1) U (1,i) ,i∈[1,2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing, obtaining U in the 5 th period (2,i) Is calculated according to the calculation result of (2);
step 2.4. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 8 th period (2,2) According to the calculation result of L at the same time (j,2) =a (j,2) -(L (j,1) U (1,2) /U (2,2) ),j∈[3,4,5,6]By means of a calculation unit PE 2 To PE 5 : parallel computing, obtaining L in 9 th period (j,2) Is calculated according to the calculation result of (2);
step 2.5. According toBy means of a computing unit PE 1 To PE 4 Parallel computing, obtaining U in 12 th period (3,i) Is calculated according to the calculation result of (2);
step 2.6. By calculationUnit PE 1 The reciprocal calculator in (2) calculates 1/U at 15 th period (3,3) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 4 Parallel computation to get L at 16 th cycle (j,3) Is calculated according to the calculation result of (2);
step 2.7. According toBy means of a computing unit PE 1 To PE 3 Parallel computing, obtaining U in 19 th period (4,i) Is calculated according to the calculation result of (2);
step 2.8. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at 22 th period (4,4) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 3 Parallel computing, obtaining L in 23 rd period (j,4) Is calculated according to the calculation result of (2);
step 2.9. According toBy means of a computing unit PE 1 To PE 2 Parallel computing, obtaining U in 27 th period (5,i) Is calculated according to the calculation result of (2);
step 2.10. By means of the calculation unit PE 1 The reciprocal calculator in (2) obtains 1/U in 30 th period (5,5) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 Calculation to obtain L at cycle 32 (6,5) Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit PE 1 Calculate U at 35 th cycle (6,6) Is calculated according to the calculation result of (2);
calculation unit PE t Processing input data m k The process of (1) comprises:
m 0 enters a reciprocal calculator and passes through three clock cycles and a gating device MUX 5 At Sel 5 Multiplying the output result when=0, and then the obtained result is multiplied by MUX 6 At Sel 6 Output results when=0 are added; sel (Sel) f Represents a selection signal, f=1, 2,3,4,5,6;
m 1 and m is equal to 2 Input to multiplier M 1 The result obtained is passed through a gate MUX 1 When Sel 1 Select input adder a when=0 5 When Sel 1 Select input adder a when=1 1 ;
m 3 And m is equal to 4 Input to multiplier M 2 The obtained result is input into an adder A 1 ;
m 5 And m is equal to 6 Input to multiplier M 3 The obtained result is input into an adder A 2 ;
m 7 And m is equal to 8 Input to multiplier M 4 The result obtained is passed through MUX 3 When Sel 2 Select input adder a when=0 2 When Sel 2 Select input adder a when=1 5 ;
m 9 And m is equal to 10 Input to multiplier M 5 The obtained result is input into an adder A 3 ;
m 11 Input to a gate MUX 6 When Sel 6 Select input adder a when=0 6 When Sel 6 Select input adder a when=1 3 ;
The process of processing the input data by the computing unit PEt further comprises:
adder A 1 The output of (a) passes through a gate MUX 2 When Sel 3 Select input adder a when=0 5 When Sel 3 Select input adder a when=1 4 ;
Adder A 2 The output of the adder is directly connected to the adder A 4 ;
Adder A 3 The output of the adder is directly connected to the adder A 5 ;
Adder A 4 The output of (a) passes through a gate MUX 4 When Sel 4 Select input multiplier M when=0 6 When Sel 4 Select input adder a when=1 5 ;
Adder A 5 The output of (a) passes through a gate MUX 5 When Sel 5 Output as final result n when=0 0 When Sel 5 Select input multiplier M when=1 6 ;
The reciprocal calculator data processing process includes:
preprocessing the data S to be calculated, and right-shifting the decimal point to the right side of the first 1 to obtain processed data f trans (S) and recording the shift number p;
processing the data f trans (S) performing binary representation to obtain [ S ] M-1 ,s M-2 ,…,s N ,s N-1 ,…,s 0 ] 2 And split the binary representation to obtain x S =[s M-1 ,s M-2 ,…,s N ] 2 And D S =[s N-1 ,s N-2 ,…,s 0 ] 2 The method comprises the steps of carrying out a first treatment on the surface of the M and N represent x S And D S Is a bit width of (2);
x S as an address input lookup table to obtain the result f Table (x S );
The calculation formula of the reciprocal calculator is as follows:
wherein [ (S)] 2 Representing binary representation operations on variables, M.gtoreq.N, p is initially 0, f Table (. Cndot.) means a look-up table operation,
2. the spatial target feature on-orbit real-time parallel LU decomposition calculation system according to claim 1, wherein each input register in the input register set is labeled a (i,j) ;
Each output L matrix register mark in the output L matrix register group is L (i,j) ;
Each output U matrix register mark in the output U matrix register group is U (i,j) ;
Wherein i and j represent two-dimensional register row and column identifications, and for an N-dimensional matrix, i, j E [1, N ] and i, j belong to positive integers.
3. The on-orbit real-time parallel LU decomposition computing system according to claim 1, wherein the computing unit group includes t computing units PE t T is a positive integer;
the computing unit PE t Comprising the following steps: multiplier M f Adder A f Gate MUX f And a reciprocal calculator;
the multiplier is used for executing multiplication operation;
the adder is used for executing addition operation;
the gate is used for selecting the signal Sel according to f Outputting the received signal, f=1, 2,3,4,5,6;
the reciprocal calculator is used for carrying out table look-up, multiplication, subtraction and shift operation on the input data to be decomposed.
4. The on-orbit real-time parallel LU decomposition calculation method for the space target features is characterized by comprising the following steps of:
the LU parallel computing scheduling controller controls the input register group to receive and store data to be decomposed;
the LU parallel computing scheduling controller generates a time sequence scheduling mechanism according to a computing unit group, reads the data to be decomposed through the time sequence scheduling mechanism, and sequentially transmits the data to be decomposed to the computing unit group for LU decomposition computation to obtain an L matrix result and a U matrix result;
the LU parallel computing scheduling controller correspondingly transmits and stores the L matrix result and the U matrix result to an output L matrix register group and a U matrix register group;
the LU parallel computing scheduling controller reads and outputs the L matrix result and the output U matrix result;
reading the data to be decomposed through the time schedule mechanism, and sequentially transmitting the data to be decomposed to the computing unit group to perform LU decomposition computation specifically includes:
the data to be decomposed is input into the computing unit group to complete LU decomposition computation after 35 periods;
step 2.1. Input register a (i,j) Direct assignment: a, a (1,i) =U (1,i) ,i∈[1,2,3,4,5,6]The process is completed in the 1 st period; u (U) (i,j) Representing an output U matrix register;
step 2.2. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 3 rd period (1,1) According to the calculation result of L at the same time (j,1) =a (j,1) (1/U (1,1) ),j∈[2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing to obtain L (j,1) Is calculated according to the calculation result of (2); l (L) (i,j) Representing an L matrix register;
step 2.3. According to U (2,i) =a (2,i) -L (2,1) U (1,i) ,i∈[1,2,3,4,5,6]By means of a calculation unit PE 1 To PE 5 Parallel computing, obtaining U in the 5 th period (2,i) Is calculated according to the calculation result of (2);
step 2.4. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at the 8 th period (2,2) According to the calculation result of L at the same time (j,2) =a (j,2) -(L (j,1) U (1,2) /U (2,2) ),j∈[3,4,5,6]By means of a calculation unit PE 2 To PE 5 : parallel computing, obtaining L in 9 th period (j,2) Is calculated according to the calculation result of (2);
step 2.5. According toBy means of a computing unit PE 1 To PE 4 Parallel computing, obtaining U in 12 th period (3,i) Is calculated according to the calculation result of (2);
step 2.6. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at 15 th period (3,3) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 4 Parallel computation to get L at 16 th cycle (j,3) Is calculated according to the calculation result of (2);
step 2.7. According toBy means of a computing unit PE 1 To PE 3 Parallel computing, obtaining U in 19 th period (4,i) Is calculated according to the calculation result of (2);
step 2.8. By means of the calculation unit PE 1 The reciprocal calculator in (2) calculates 1/U at 22 th period (4,4) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 To PE 3 Parallel computing, obtaining L in 23 rd period (j,4) Is calculated according to the calculation result of (2);
step 2.9. According toBy means of a computing unit PE 1 To PE 2 Parallel computing, obtaining U in 27 th period (5,i) Is calculated according to the calculation result of (2);
step 2.10. By means of the calculation unit PE 1 The reciprocal calculator in (2) obtains 1/U in 30 th period (5,5) According to the calculation result of (1) at the same timeBy means of a computing unit PE 2 Calculation to obtain L at cycle 32 (6,5) Is calculated according to the calculation result of (2);
step 2.11 according toBy means of a computing unit PE 1 Calculate U at 35 th cycle (6,6) Is calculated according to the calculation result of (2);
the computing unit PEt processes the input data m k The process of (1) comprises:
m 0 enters a reciprocal calculator and passes through three clock cycles and a gating device MUX 5 At Sel 5 Multiplying the output result when=0, and then the obtained result is multiplied by MUX 6 At Sel 6 Output results when=0 are added; sel (Sel) f Represents a selection signal, f=1, 2,3,4,5,6;
m 1 and m is equal to 2 Input to multiplier M 1 The result obtained is passed through a gate MUX 1 When Sel 1 Select input adder a when=0 5 When Sel 1 Select input adder a when=1 1 ;
m 3 And m is equal to 4 Input to multiplier M 2 The obtained result is input into an adder A 1 ;
m 5 And m is equal to 6 Input to multiplier M 3 The obtained result is input into an adder A 2 ;
m 7 And m is equal to 8 Input to multiplier M 4 The result obtained is passed through MUX 3 When Sel 2 Select input adder a when=0 2 When Sel 2 Select input adder a when=1 5 ;
m 9 And m is equal to 10 Input to multiplier M 5 The obtained result is input into an adder A 3 ;
m 11 Input to a gate MUX 6 When Sel 6 Select input adder a when=0 6 When Sel 6 Select input adder a when=1 3 ;
The process of processing the input data by the computing unit PEt further comprises:
adder A 1 The output of (a) passes through a gate MUX 2 When Sel 3 Select input adder a when=0 5 When Sel 3 Select input adder a when=1 4 ;
Adder A 2 The output of the adder is directly connected to the adder A 4 ;
Adder A 3 The output of the adder is directly connected to the adder A 5 ;
Adder A 4 The output of (a) passes through a gate MUX 4 When Sel 4 Select input multiplier M when=0 6 When Sel 4 Select input adder a when=1 5 ;
Adder A 5 The output of (a) passes through a gate MUX 5 When Sel 5 Output as final result n when=0 0 When Sel 5 Select input multiplier M when=1 6 ;
The reciprocal calculator data processing process includes:
preprocessing the data S to be calculated, and right-shifting the decimal point to the right side of the first 1 to obtain processed data f trans (S) and recording the shift number p;
processing the data f trans (S) performing binary representation to obtain [ S ] M-1 ,s M-2 ,…,s N ,s N-1 ,…,s 0 ] 2 And split the binary representation to obtain x S =[s M-1 ,s M-2 ,…,s N ] 2 And D S =[s N-1 ,s N-2 ,…,s 0 ] 2 The method comprises the steps of carrying out a first treatment on the surface of the M and N represent x S And D S Is a bit width of (2);
x S as an address input lookup table to obtain the result f Table (x S );
The calculation formula of the reciprocal calculator is as follows:
wherein [ (S)] 2 Representing binary representation operations on variables, M.gtoreq.N, P initially being 0, f Table (. Cndot.) means a look-up table operation,
5. the on-orbit real-time parallel LU decomposition calculation method of space target features according to claim 4, wherein the maximum of the data to be decomposed, the L matrix result and the U matrix result is a 6x6 matrix;
the computing unit group comprises t computing units PEt for parallel computing, wherein t is a positive integer less than or equal to 8; the computing unit PE t Comprising the following steps: multiplier M f Adder A f Gate MUX f And a reciprocal calculator, f=1, 2,3,4,5,6.
6. The on-orbit real-time parallel LU decomposition calculation method of space object features according to claim 4, wherein a single calculation unit PEt processes at most 12 input data m simultaneously k ,K∈[0,1,…,11]。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310827007.XA CN116560733B (en) | 2023-07-07 | 2023-07-07 | Space target feature on-orbit real-time parallel LU decomposition computing system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310827007.XA CN116560733B (en) | 2023-07-07 | 2023-07-07 | Space target feature on-orbit real-time parallel LU decomposition computing system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116560733A CN116560733A (en) | 2023-08-08 |
CN116560733B true CN116560733B (en) | 2023-10-24 |
Family
ID=87502175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310827007.XA Active CN116560733B (en) | 2023-07-07 | 2023-07-07 | Space target feature on-orbit real-time parallel LU decomposition computing system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116560733B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004079585A1 (en) * | 2003-03-07 | 2004-09-16 | Matsushita Electric Industrial Co., Ltd. | Matrix operating device |
CN107341133A (en) * | 2017-06-24 | 2017-11-10 | 中国人民解放军信息工程大学 | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition |
CN110457648A (en) * | 2019-07-30 | 2019-11-15 | 暨南大学 | A kind of implementation method of the systolic array architecture decomposed for LU |
CN114996649A (en) * | 2022-05-09 | 2022-09-02 | 深圳市国微电子有限公司 | Method for realizing matrix decomposition and lower triangular matrix inversion |
-
2023
- 2023-07-07 CN CN202310827007.XA patent/CN116560733B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004079585A1 (en) * | 2003-03-07 | 2004-09-16 | Matsushita Electric Industrial Co., Ltd. | Matrix operating device |
CN107341133A (en) * | 2017-06-24 | 2017-11-10 | 中国人民解放军信息工程大学 | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition |
CN110457648A (en) * | 2019-07-30 | 2019-11-15 | 暨南大学 | A kind of implementation method of the systolic array architecture decomposed for LU |
CN114996649A (en) * | 2022-05-09 | 2022-09-02 | 深圳市国微电子有限公司 | Method for realizing matrix decomposition and lower triangular matrix inversion |
Also Published As
Publication number | Publication date |
---|---|
CN116560733A (en) | 2023-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11698773B2 (en) | Accelerated mathematical engine | |
CN110163360B (en) | Computing device and method | |
CN115039067A (en) | Systolic array including fused multiply accumulate with efficient pre-normalization and extended dynamic range | |
EP0275979A2 (en) | Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples | |
JP3228927B2 (en) | Processor element, processing unit, processor, and arithmetic processing method thereof | |
CN110766128A (en) | Convolution calculation unit, calculation method and neural network calculation platform | |
CN116560733B (en) | Space target feature on-orbit real-time parallel LU decomposition computing system and method | |
CN101426134A (en) | Hardware device and method for video encoding and decoding | |
CN108762719B (en) | Parallel generalized inner product reconstruction controller | |
Mohanty et al. | Design and performance analysis of fixed-point jacobi svd algorithm on reconfigurable system | |
CN106775579A (en) | Floating-point operation accelerator module based on configurable technology | |
CN116888591A (en) | Matrix multiplier, matrix calculation method and related equipment | |
CN110674456B (en) | Time-frequency conversion method of signal acquisition system | |
US10127040B2 (en) | Processor and method for executing memory access and computing instructions for host matrix operations | |
CN116578819A (en) | Sparse fraction Fourier transform FPGA implementation method and system | |
Sankaran et al. | Design and Implementation of 1024 Point Pipelined Radix 4 FFT Processor on FPGA for Biomedical Signal Processing Applications | |
Chaudhari et al. | An optimized approach to pipelined architecture for fast 2D normalized cross-correlation | |
Ghaffari et al. | A fully pipelined FPGA architecture for multiscale BRISK descriptors with a novel hardware-aware sampling pattern | |
Khamaneh et al. | Real–time memory efficient SLIC accelerator for low–power applications | |
JPH0766372B2 (en) | Floating point processor | |
Hsiao et al. | Design of a low-cost floating-point programmable vertex processor for mobile graphics applications based on hybrid number system | |
Chen et al. | Edge FPGA-based onsite neural network training | |
Chen et al. | A high-throughput and area-efficient video transform core with a time division strategy | |
TWI564735B (en) | Data allocating apparatus, signal processing apparatus, and data allocating method | |
CN114900703B (en) | Universal video coding system and data processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |