CN112437303B - JPEG decoding method and device - Google Patents
JPEG decoding method and device Download PDFInfo
- Publication number
- CN112437303B CN112437303B CN202011263958.1A CN202011263958A CN112437303B CN 112437303 B CN112437303 B CN 112437303B CN 202011263958 A CN202011263958 A CN 202011263958A CN 112437303 B CN112437303 B CN 112437303B
- Authority
- CN
- China
- Prior art keywords
- data
- pictures
- reading
- decoded data
- paths
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 102100038126 Tenascin Human genes 0.000 claims abstract description 16
- 108010008125 Tenascin Proteins 0.000 claims abstract description 16
- 230000001133 acceleration Effects 0.000 claims abstract description 9
- 239000012634 fragment Substances 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000004075 alteration Effects 0.000 description 2
- 230000001351 cycling effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/15—Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
Abstract
The invention discloses a JPEG decoding method and a device, which are applied to the field of image processing, wherein an FPGA (field programmable gate array) accelerator card reads M pictures from the DDR (double data rate) of the FPGA accelerator card, wherein M is an integer larger than 1; the FPGA acceleration card correspondingly distributes M pictures read from the DDR to M JPEG decoders, and the M JPEG decoders share the same GMEM resource on the FPGA acceleration card to carry out M paths of parallel JPEG decoding on the M pictures to obtain M paths of decoding data streams; the FPGA acceleration card reads and merges M paths of decoding data streams to obtain merged decoding data; and outputting the merged decoded data to the DDR. The JPEG decoding efficiency is improved through the method and the device.
Description
Technical Field
The invention belongs to the field of image processing, and particularly relates to a JPEG decoding method and device.
Background
The joint image experts group (English: joint Photographic Experts Group, abbreviation: JPEG) is a lossy compression standard method widely used for photographic imaging. JPEG itself is only a data stream (streaming) describing how to convert an image into bytes. An additional standard created by companies such as C-Cube Microsystems, called JF (JPEG FILE INTERCHANGE Format, JPEG File exchange Format, joint image experts group File exchange Format) details how to produce a file suitable for computer storage and transmission from a JPEG stream.
With the increasing capacity of the accelerator card, when the FPGA (Field Programmable GATE ARRAY ) accelerator card is used for JPEG decoding, more cores can be placed on the FPGA accelerator card in order to maximize throughput of JPEG decoding. However, each core occupies a certain amount of GMEM resources, and GMEM (cache) resources on each accelerator card are limited, so that more cores are limited to be placed, and further, the JPEG decoding efficiency of the FPGA is limited.
Disclosure of Invention
In view of the above problems in the prior art, an embodiment of the present invention provides a method and an apparatus for JPEG decoding, which are used for improving the efficiency of JPEG decoding.
In a first aspect, an embodiment of the present invention provides a JPEG decoding method, applied to an FPGA accelerator card, the method including:
M pictures are read from the DDR of the FPGA acceleration card, wherein M is an integer greater than 1;
Correspondingly distributing M pictures read from the DDR to M JPEG decoders, wherein the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures, so as to obtain M paths of decoding data streams;
Reading and converging the M paths of decoding data streams to obtain converged decoding data;
Outputting the merged decoded data to the DDR.
Optionally, the reading the data segment of the M pictures from the DDR of the FPGA accelerator card includes:
and sequentially reading the data fragments of the M pictures from the DDR based on a preset reading sequence and the data fragment size until the data of the M pictures are read.
Optionally, the reading the data segments of the M pictures from the DDR in turn based on a preset read-in sequence and a data segment size includes:
judging a first overall state for the M pictures, wherein the first overall state represents whether the M pictures are read in completely;
if the first overall state is no, polling to check whether data exist in the data of each picture in the M pictures, and if the data exist in the current check picture, reading in the next data fragment from the current check picture, and updating the data state of the current check picture;
And updating the first overall state according to the data state of each picture in the M pictures, and returning to the step of judging the first overall state of the M pictures.
Optionally, the reading and merging the M paths of decoded data streams to obtain merged decoded data includes:
sequentially reading out the decoded data fragments from the M paths of decoded data streams based on a preset reading sequence;
merging the decoded data segments read from the M paths of decoded data streams to obtain merged decoded data.
Optionally, the reading the decoded data segment from the M-path decoded data stream based on a preset reading sequence includes:
Judging a second overall state of the M paths of decoded data streams, wherein the second overall state represents whether the M paths of decoded data streams are all empty;
if not, the next decoded data segment is read out from the i-th decoded data stream, and i is sequentially 1 to M;
And updating the second overall state according to the state of each path of decoding data stream in the M paths of decoding data streams, and returning to the step of judging the second overall state of the M paths of decoding data streams.
In a second aspect, an embodiment of the present invention provides a JPEG decoding device, applied to an FPGA accelerator card, the device including:
The input unit is used for reading M pictures from the DDR of the FPGA acceleration card, wherein M is an integer greater than 1;
The decoding unit is used for correspondingly distributing M pictures read from the DDR to M JPEG decoders, and the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures to obtain M paths of decoding data streams;
And the output unit is used for reading and converging the M paths of decoded data streams to obtain converged decoded data, and outputting the converged decoded data to the DDR.
Optionally, the input unit is specifically configured to:
and sequentially reading the data fragments of the M pictures from the DDR based on a preset reading sequence and the data fragment size until the data of the M pictures are read.
Optionally, the input unit includes:
A first judging subunit, configured to judge a first overall state for the M pictures, where the first overall state characterizes whether all of the M pictures have been read in;
A polling subunit, configured to poll and check whether data exists in data of each of the M pictures if the first overall state is no, and if data exists in a current check picture, read in a next data segment from the current check picture, and update a data state of the current check picture;
And the first state updating subunit is used for updating the first overall state according to the data state of each picture in the M pictures and returning to the step of judging the first overall state of the M pictures.
Optionally, the output unit includes:
A reading subunit, configured to sequentially read out the decoded data segments from the M paths of decoded data streams based on a preset reading sequence;
and the merging subunit is used for merging the decoded data fragments read out from the M paths of decoded data streams to obtain merged decoded data.
Optionally, the readout subunit includes:
A second judging subunit, configured to judge a second overall state of the M-way decoded data stream, where the second overall state characterizes whether the M-way decoded data stream is empty;
The decoding data reading subunit is used for judging whether an ith decoding data stream in the M paths of decoding data streams is empty or not if not, and reading a next decoded data fragment from the ith decoding data stream if not, wherein i is 1 to M in sequence;
And the second state updating subunit is used for updating the second overall state according to the state of each path of decoded data stream in the M paths of decoded data streams and returning to the step of judging the second overall state of the M paths of decoded data streams.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a main processor, an FPGA accelerator card, and a DDR cache on the FPGA accelerator card, a computer program stored on the memory and operable on the FPGA accelerator card, where the DDR cache obtains M pictures from the main processor, and M is an integer greater than 1; and when the FPGA acceleration card executes the program, the method according to any one of the first aspect is realized for the M pictures.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods of the first aspect.
The one or more technical solutions provided by the embodiments of the present invention at least achieve the following technical effects or advantages:
According to the JPEG decoding method and device provided by the embodiment of the invention, the FPGA accelerator card correspondingly distributes M pictures read from the DDR to M JPEG decoders, the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures, and M paths of decoding data streams are correspondingly obtained; reading and converging M paths of decoding data streams to obtain converged decoding data; and outputting the merged decoded data to the DDR. Therefore, the same GMEM resource is shared by multiple paths of data, the multipath parallelization of the JPEG file level is realized, and the efficiency of JPEG decoding on the FPGA accelerator card is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flowchart showing a JPEG decoding method in an embodiment of the present invention;
FIG. 2 is a diagram showing a data flow of a JPEG decoding method according to an embodiment of the present invention;
Fig. 3 is a schematic diagram showing a structure of a JPEG decoding apparatus in an embodiment of the present invention.
Detailed Description
In order to solve the problem that the prior art limits the JPEG decoding efficiency of an FPGA, the embodiment of the invention provides a JPEG decoding method and device, and the general idea is as follows:
The FPGA acceleration card carries out M paths of parallel JPEG decoding on M pictures read from DDR SDRAM (Double Data Rate Synchronous Dynamic Random Access Memory, double speed synchronous dynamic random access memory) through the same GMEM resource on the FPGA acceleration card shared by M JPEG decoders, and M paths of decoding data streams are correspondingly obtained; and merging M paths of decoded data streams and outputting the merged data streams to the DDR. Therefore, the same GMEM resource is shared by multiple paths of data, and the multipath parallelization of the JPEG file level is realized, so that the efficiency of JPEG decoding by the FPGA is further improved.
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The term "and/or" as used herein is merely one association relationship describing the associated object, meaning that there may be three relationships, e.g., a and/or B, which may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship; the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
First embodiment
The embodiment of the invention provides a JPEG decoding method, which is applied to a JPEG accelerator card and realizes high-efficiency JPEG decoding based on an FPGA accelerator card.
The following describes in detail a JPEG decoding method provided in an embodiment of the present invention with reference to fig. 1 and 2:
first, step S101 is performed: m pictures are read from DDR of the FPGA accelerator card, and M is an integer greater than 1.
In a specific implementation, step S101 may be to read the entire picture at a time, and execute step S102 after all the M pictures are read. In order to start the decoding process for M pictures as soon as possible, so as to improve the decoding efficiency, step S101 may not read the whole picture at a time, but sequentially read the data segments of M pictures from the DDR based on the preset reading sequence and the fixed data segment size until all the data of M pictures are read. For each picture in the M pictures, after one data segment of the picture is read from the DDR, the JPEG decoding process for the picture is triggered to start, and decoding is not required to be started after the whole picture is read, so that the decoding efficiency is improved.
Specifically, the implementation process of sequentially reading the data segments of the M pictures from the DDR based on the preset reading sequence is as follows: firstly, sequentially reading a first data fragment on each picture in M pictures; sequentially reading second data fragments on each picture in the M pictures; and then, reading the third data segment of the M pictures, … …, proceeding according to the rule, until the last data segment of each picture in the M pictures is read, finishing reading all data of the M pictures, and ending the process of reading the data segments from the DRR.
For example, the detailed read-in sequence may be: sequentially reading in the data segment 1 of the 1 st picture, reading in the data segment 1 of the 2 nd picture, reading in the data segments 1 and … … of the 3 rd picture until the data segment 1 of the M th picture is read in; and then sequentially reading the data segment 2 of the 1 st picture, reading the data segment 2 of the 2 nd picture, reading the data segments 2 and … … of the 3 rd picture, and finishing the flow of reading the data segments from the DDR after the last data segment of the M th picture is read.
Since the multiple decoders share GMEM resources, in order to ensure the reliability of the process of reading the data segments from the DDR, the steps of sequentially reading the data segments of M pictures from the DDR based on the preset reading sequence and the data segment size specifically include the following steps S1011 to S1013:
Step S1011, judging a first overall state for the M pictures, wherein the first overall state represents whether the M pictures are read in completely.
Step S1012, if the first overall state is NO, polling to check whether the data of each picture in the M pictures exist or not, if the data exists in the current check picture, reading in the next data segment from the current check picture, and updating the data state of the current check picture.
Step S1012, after the M pictures are polled, the first overall state is updated according to the data state of each picture in the M pictures, and the step S1011 is executed in a return mode.
The first overall state is used for representing whether M pictures are completely read. Specifically, the first overall state is calculated according to the data state of each picture in the M pictures, and the data state of each picture characterizes whether the picture has been read.
In the following, in order to fully understand the flow of reading M pictures from the DDR, taking the example of including picture a, picture B, and picture C, the procedure of reading M pictures from the DDR will be illustrated with reference to fig. 2 and the following steps 1 to 5:
Step1: acquiring a first integral state aiming at a picture A, a picture B and a picture C, judging whether the picture A, the picture B and the picture C have no data according to the first integral state, and if so, ending the flow of reading the data fragments from the DDR; otherwise, step 2 is performed.
Step 2: reading the data state of the picture A, and judging whether the data in the picture A is read or not according to the data state of the picture A; if yes, directly entering step 3; otherwise, reading the next data fragment from the picture A, and executing the step 3 after updating the data state of the picture A;
step 3: reading the data state of the picture B, and judging whether the data in the picture B is read or not according to the data state of the picture B; if yes, directly entering step 4; otherwise, reading in the next data segment from the picture B, and executing the step 4 after updating the data state of the picture B;
Step 4: reading the data state of the picture C, judging whether the data in the picture C is read or not according to the data state of the picture C, and if so, directly entering the step 5; otherwise, reading in the next data segment from the picture C, and executing the step 5 after updating the data state of the picture C;
Step 5: and (3) calculating a first overall state according to the current data state of each picture in the pictures A, B and C, and returning to the step (1).
By cycling the steps 1-5 until all the data on the pictures A, B and C are read, the flow of reading the data fragments from the DDR is ended, so that the data fragments can be continuously read from each picture in turn, the data inflow when decoding by each JPEG decoder is met, and the smooth proceeding of multi-channel parallel JPEG decoding is ensured.
S102: and correspondingly distributing M pictures read from the DDR to M JPEG decoders, wherein the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures, so as to obtain M paths of decoding data streams.
Specifically, if the whole picture is not read in one time for each picture in step S101, the corresponding JPEG decoder is triggered to start JPEG decoding the picture once the first data segment of the picture is read in. It should be noted that, the M-path JPEG decoding is to simultaneously perform JPEG of the corresponding picture, and each path of JPEG decoding process specifically includes: the current data segment read from the corresponding picture is decompressed, and then the decompression result is dequantized; and finally, performing inverse discrete cosine transform on the inverse quantization result to obtain decoded data judgment corresponding to the data segment. And then, decoding the next data segment, and sequentially forming a path of decoded data stream corresponding to the picture.
S103, reading and merging M paths of decoded data streams to obtain merged decoded data.
In a specific embodiment, step S103 specifically includes: sequentially reading out the decoded data fragments from the M paths of decoded data streams based on a preset reading sequence; merging the decoded data segments read from the M decoded data streams to obtain merged decoded data.
In order to smoothly perform the process of reading out the decoded data segments, in the implementation process of reading out the decoded data segments from the M-path decoded data streams, for each path of decoded data stream, it is necessary to determine whether the path of decoded data stream is empty or not before reading out data from the path of decoded data stream, and if not, the decoded data segments are read out from the path of decoded data stream, so that the blocking of the process when no data is in the decoded data stream can be avoided.
Specifically, the readout of the decoded data segments from the M decoded data streams is performed by: judging a second overall state of the M paths of decoded data streams, wherein the second overall state represents whether the M paths of decoded data streams are all empty; if the second overall state represents that the M paths of decoded data streams are all empty, ending the flow of reading out the decoded data fragments from the M paths of decoded data streams, if the second overall state represents that the M paths of decoded data streams are not all empty, judging whether the ith path of decoded data stream in the M paths of decoded data streams is empty, and if the ith path of decoded data stream is not empty, reading out the next decoded data fragment from the ith path of decoded data stream, and taking 1 to M in sequence; and updating the second overall state according to the state of each path of decoding data stream in the M paths of decoding data streams, and returning to the step of judging the second overall state of the M paths of decoding data streams.
Specifically, the second overall state is calculated according to the state of each of the M decoded data streams, and the state of each decoded data stream characterizes whether the decoded data stream is null.
To fully understand the flow of reading out the decoded data segments from the M decoded data streams, taking as an example a case where only the pictures a, B, and C are included, the flow of reading out the decoded data segments from the M decoded data streams is illustrated with reference to fig. 2 and the following steps 11 to 15:
step 11: and acquiring a second integral state aiming at the M paths of decoded data streams, judging whether the M paths of decoded data streams are all empty according to the second integral state, if so, ending the process of reading out the decoded data fragments from the M paths of decoded data streams, otherwise, executing the step 12.
Step 12: reading the state of the decoded data stream1 corresponding to the picture A, judging whether the state of the stream1 is empty, and if so, directly entering step 13; otherwise, the next decoded data fragment is read out from stream1, and the status of stream1 is updated before executing step 13.
Step 13: reading the state of the decoded data stream2 corresponding to the picture B, judging whether the state of the stream2 is empty, and if so, directly entering step 14; otherwise, the next decoded data fragment is read out from stream2, and step 14 is performed after the state of stream2 is updated.
Step 14: reading the state of the decoded data stream3 corresponding to the picture C, judging whether the state of the stream3 is empty, and if so, directly entering step 15; otherwise, the next decoded data fragment is read out from stream3, and step 15 is performed after updating the state of stream 3.
Step 15: a second overall state is calculated from the state of whether each of the 3 decoded data streams of streams 1 to 3 is empty, and returns to step 11.
By cycling through the above steps 11-15, the decoded data segments can be read out from each decoded data stream continuously, and the whole read-out flow is prevented from being blocked when the decoded data stream is empty.
S104, outputting the merged decoded data to the DDR.
In a second aspect, based on the same inventive concept, an embodiment of the present invention provides a JPEG decoding apparatus applied to an FPGA accelerator card, referring to fig. 3, the JPEG decoding apparatus includes:
an input unit 301, configured to read M pictures from a DDR of an FPGA accelerator card, where M is an integer greater than 1;
The decoding unit 302 is configured to correspondingly allocate M pictures read from the DDR to M JPEG decoders, where the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to perform M parallel JPEG decoding on the M pictures, so as to obtain M decoded data streams;
And an output unit 303, configured to read and merge the M decoded data streams to obtain merged decoded data, and output the merged decoded data to the DDR.
In an alternative embodiment, the input unit 301 is specifically configured to:
And sequentially reading the data fragments of the M pictures from the DDR based on a preset reading sequence and the data fragment size until the data of the M pictures are read.
In an alternative embodiment, the input unit 301 includes:
the first judging subunit is used for judging a first integral state aiming at the M pictures, wherein the first integral state represents whether the M pictures are read in completely;
a polling subunit, configured to poll and check whether data exists in data of each of the M pictures if the first overall state is no, and if data exists in the current check picture, read in a next data segment from the current check picture, and update the data state of the current check picture;
And the first state updating subunit is used for updating the first overall state according to the data state of each picture in the M pictures and returning to the step of judging the first overall state of the M pictures.
In an alternative embodiment, the output unit 303 includes:
a reading subunit, configured to sequentially read out the decoded data segments from the M paths of decoded data streams based on a preset reading sequence;
And the merging subunit is used for merging the decoded data fragments read out from the M paths of decoded data streams to obtain merged decoded data.
In an alternative embodiment, the read-out subunit comprises:
the second judging subunit is used for judging a second integral state of the M paths of decoded data streams, and the second integral state represents whether the M paths of decoded data streams are all empty;
The decoding data reading subunit is used for judging whether an ith decoding data stream in the M decoding data streams is empty or not if not, and reading a next decoded data fragment from the ith decoding data stream if not, wherein i is 1 to M in sequence;
and the second state updating subunit is used for updating the second overall state according to the state of each path of decoded data stream in the M paths of decoded data streams and returning to the step of judging the second overall state of the M paths of decoded data streams.
The specific implementation details of the functional units in the above apparatus have been described in the foregoing JPEG decoding method embodiment, and reference may be made to the description in the foregoing method embodiment in the implementation, which is not repeated herein for brevity of description.
The third aspect, based on the same inventive concept as the foregoing JPEG decoding method embodiment, further provides an electronic device, including a memory, a main processor, an FPGA accelerator card, and a DDR cache on the FPGA accelerator card, a computer program stored on the memory and capable of running on the FPGA accelerator card, where the DDR cache obtains M pictures from the main processor, and M is an integer greater than 1; when the FPGA accelerator card executes the program, the steps of any one embodiment mode of the JPEG decoding method embodiment are realized for the M pictures.
In a fourth aspect, based on the inventive concept with the aforementioned JPEG decoding method embodiment, the present invention further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the aforementioned embodiments of the JPEG decoding method.
According to the JPEG decoding method and device provided by the embodiment of the invention, an FPGA (field programmable gate array) accelerator card correspondingly distributes M pictures read from DDR (double data rate) to M JPEG decoders, and the M JPEG decoders share the same GMEM (generalized message processing) resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures to obtain M paths of decoded data streams; reading and converging M paths of decoding data streams to obtain converged decoding data; outputting the merged decoded data to DDR. Therefore, the same GMEM resource is shared by multiple paths of data, the multipath parallelization of the JPEG file level is realized, and the efficiency of JPEG decoding on the FPGA accelerator card is further improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (7)
1. A JPEG decoding method applied to an FPGA accelerator card, the method comprising:
Reading M pictures from the DDR of the FPGA accelerator card, wherein the method comprises the following steps of: sequentially reading the data fragments of the M pictures from the DDR based on a preset reading sequence and the data fragment size until the data of the M pictures are read; the reading the data segments of the M pictures from the DDR in sequence based on a preset read-in sequence and a data segment size includes: judging a first overall state for the M pictures, wherein the first overall state represents whether the M pictures are read in completely; if the first overall state is no, polling to check whether data exist in the data of each picture in the M pictures, and if the data exist in the current check picture, reading in the next data fragment from the current check picture, and updating the data state of the current check picture; updating the first overall state according to the data state of each picture in the M pictures, and returning to the step of judging the first overall state of the M pictures, wherein M is an integer greater than 1;
Correspondingly distributing M pictures read from the DDR to M JPEG decoders, wherein the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures, so as to obtain M paths of decoding data streams;
Reading and converging the M paths of decoding data streams to obtain converged decoding data;
Outputting the merged decoded data to the DDR.
2. The method of claim 1, wherein said reading and merging said M decoded data streams to obtain merged decoded data comprises:
sequentially reading out the decoded data fragments from the M paths of decoded data streams based on a preset reading sequence;
merging the decoded data segments read from the M paths of decoded data streams to obtain merged decoded data.
3. The method of claim 2, wherein said reading out the decoded data segments from the M decoded data streams based on a preset read-out order comprises:
Judging a second overall state of the M paths of decoded data streams, wherein the second overall state represents whether the M paths of decoded data streams are all empty;
if not, the next decoded data segment is read out from the i-th decoded data stream, and i is sequentially 1 to M;
And updating the second overall state according to the state of each path of decoding data stream in the M paths of decoding data streams, and returning to the step of judging the second overall state of the M paths of decoding data streams.
4. A JPEG decoding device for use with an FPGA accelerator card, said device comprising:
The input unit is used for reading M pictures from the DDR of the FPGA acceleration card, and comprises: reading the data segments of the M pictures from the DDR in sequence based on a preset reading sequence and a data segment size until the data of the M pictures are read, wherein the reading the data segments of the M pictures from the DDR in sequence based on the preset reading sequence and the data segment size comprises the following steps: judging a first overall state for the M pictures, wherein the first overall state represents whether the M pictures are read in completely; if the first overall state is no, polling to check whether data exist in the data of each picture in the M pictures, and if the data exist in the current check picture, reading in the next data fragment from the current check picture, and updating the data state of the current check picture; updating the first overall state according to the data state of each picture in the M pictures, and returning to the step of judging the first overall state of the M pictures, wherein M is an integer greater than 1;
The decoding unit is used for correspondingly distributing M pictures read from the DDR to M JPEG decoders, and the M JPEG decoders share the same GMEM resource on the FPGA accelerator card to carry out M paths of parallel JPEG decoding on the M pictures to obtain M paths of decoding data streams;
And the output unit is used for reading and converging the M paths of decoded data streams to obtain converged decoded data, and outputting the converged decoded data to the DDR.
5. The apparatus of claim 4, wherein the output unit comprises:
A reading subunit, configured to sequentially read out the decoded data segments from the M paths of decoded data streams based on a preset reading sequence;
and the merging subunit is used for merging the decoded data fragments read out from the M paths of decoded data streams to obtain merged decoded data.
6. An electronic device comprising a memory, a main processor, an FPGA accelerator card, and a DDR cache on the FPGA accelerator card, a computer program stored on the memory and operable on the FPGA accelerator card, the DDR cache obtaining M pictures from the main processor, M being an integer greater than 1;
When the FPGA accelerator card executes the program, the steps of the method of any one of claims 1-3 are implemented for the M pictures.
7. A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011263958.1A CN112437303B (en) | 2020-11-12 | 2020-11-12 | JPEG decoding method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011263958.1A CN112437303B (en) | 2020-11-12 | 2020-11-12 | JPEG decoding method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112437303A CN112437303A (en) | 2021-03-02 |
CN112437303B true CN112437303B (en) | 2024-06-21 |
Family
ID=74699979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011263958.1A Active CN112437303B (en) | 2020-11-12 | 2020-11-12 | JPEG decoding method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112437303B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102404578A (en) * | 2011-12-21 | 2012-04-04 | 中国科学院自动化研究所 | Multi-channel video transmission system and method |
CN105828083A (en) * | 2015-01-06 | 2016-08-03 | 中兴通讯股份有限公司 | Method and device for decoding data streams |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2863096B2 (en) * | 1994-08-29 | 1999-03-03 | 株式会社グラフィックス・コミュニケーション・ラボラトリーズ | Image decoding device by parallel processing |
CN100364323C (en) * | 2006-01-13 | 2008-01-23 | 深圳创维-Rgb电子有限公司 | Method for displaying high resolution JPEG picture using embedded Linux system TV set |
CN100444636C (en) * | 2006-07-14 | 2008-12-17 | 杭州国芯科技有限公司 | Method for improving SDRAM bus efficiency in video decoder |
CN101518091B (en) * | 2006-09-26 | 2011-09-28 | 松下电器产业株式会社 | Decoding device, decoding method, decoding program, and integrated circuit |
CN102118616A (en) * | 2011-02-24 | 2011-07-06 | 深圳市同洲电子股份有限公司 | Picture decoding method and picture decoder |
CN103841359A (en) * | 2012-11-23 | 2014-06-04 | 中兴通讯股份有限公司 | Video multi-image synthesizing method, device and system |
CN103338368B (en) * | 2013-05-15 | 2018-03-27 | 武汉精测电子集团股份有限公司 | JPEG apparatus for parallel decoding and coding/decoding method based on FPGA |
US9542760B1 (en) * | 2014-12-18 | 2017-01-10 | Amazon Technologies, Inc. | Parallel decoding JPEG images |
CN108769684B (en) * | 2018-06-06 | 2022-03-22 | 郑州云海信息技术有限公司 | Image processing method and device based on WebP image compression algorithm |
CN109842803B (en) * | 2018-09-19 | 2021-06-29 | 华为技术有限公司 | Image compression method and device |
CN110446046B (en) * | 2019-08-19 | 2021-04-27 | 杭州图谱光电科技有限公司 | Batch image fast decoding method based on embedded platform |
-
2020
- 2020-11-12 CN CN202011263958.1A patent/CN112437303B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102404578A (en) * | 2011-12-21 | 2012-04-04 | 中国科学院自动化研究所 | Multi-channel video transmission system and method |
CN105828083A (en) * | 2015-01-06 | 2016-08-03 | 中兴通讯股份有限公司 | Method and device for decoding data streams |
Also Published As
Publication number | Publication date |
---|---|
CN112437303A (en) | 2021-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101477434B1 (en) | Out-of-order command execution in a multimedia processor | |
US20080310555A1 (en) | Distributed decoding device using multi-core processor and method for the same | |
CN110163609A (en) | Data processing method and device in a kind of block chain | |
CN113457160A (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
US8634470B2 (en) | Multimedia decoding method and multimedia decoding apparatus based on multi-core processor | |
CN109544439A (en) | A kind of coding/decoding method based on multi-core processor, terminal device and storage medium | |
CN110659905A (en) | Transaction verification method, device, terminal equipment and storage medium | |
US6775757B1 (en) | Multi-component processor | |
CN111669599A (en) | Video decoding method, video decoding device and terminal equipment | |
CN115955590A (en) | Video processing method, video processing device, computer equipment and medium | |
CN107197296A (en) | A kind of HEVC parallel encoding method and systems based on COStream | |
CN112437303B (en) | JPEG decoding method and device | |
CN114466227A (en) | Video analysis method and device, electronic equipment and storage medium | |
CN110446046B (en) | Batch image fast decoding method based on embedded platform | |
Sodsong et al. | Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures | |
US6742083B1 (en) | Method and apparatus for multi-part processing of program code by a single processor | |
CN113923507B (en) | Low-delay video rendering method and device for Android terminal | |
CN111190963A (en) | Block chain application back-end system | |
CN112437308B (en) | WebP coding method and WebP coding device | |
CN112131423A (en) | Picture acquisition method, device and system | |
WO2022141115A1 (en) | Video processing method and apparatus, system on chip, and storage medium | |
CN111246215A (en) | Video format conversion method and terminal | |
CN112929792A (en) | Sox-based audio processing method and device | |
CN112437309A (en) | JPEG encoding method and device | |
CN112446497B (en) | Data block splicing method, related equipment and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |