CN111190644A - Embedded Flash on-chip read instruction hardware acceleration method and device - Google Patents
Embedded Flash on-chip read instruction hardware acceleration method and device Download PDFInfo
- Publication number
- CN111190644A CN111190644A CN201911381757.9A CN201911381757A CN111190644A CN 111190644 A CN111190644 A CN 111190644A CN 201911381757 A CN201911381757 A CN 201911381757A CN 111190644 A CN111190644 A CN 111190644A
- Authority
- CN
- China
- Prior art keywords
- instruction
- hardware
- embedded flash
- chip
- acceleration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001133 acceleration Effects 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000007246 mechanism Effects 0.000 claims abstract description 26
- 230000006872 improvement Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000013461 design Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30079—Pipeline control instructions, e.g. multicycle NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3812—Instruction prefetching with instruction modification, e.g. store into instruction stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3814—Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
The invention discloses a hardware acceleration method and device for an embedded Flash chip read instruction. Wherein the method comprises the following steps: classifying the instructions at the front end of a processor pipeline based on instruction predecoding and an instruction information classifier by using a preset tightly-coupled on-chip read instruction hardware accelerator architecture; accelerating hardware of the first type of instruction based on instruction bit width expansion and a hardware prefetching mechanism; hardware acceleration of the second type of instruction based on a high-precision hybrid branch predictor mechanism, a set associative cache mechanism; constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on key components of the system on chip; and adjusting the instruction reading speed on the embedded Flash chip in real time through a main frequency hardware detector and a hardware self-adaptive dynamic switching strategy. By adopting the method, the system on chip can be ensured to obtain the optimal instruction reading speed on the embedded Flash chip in different application scenes such as high frequency, low frequency and the like, and the performance of the system on chip is effectively improved.
Description
Technical Field
The embodiment of the invention relates to the field of computer processors, in particular to a hardware acceleration method and device for a read instruction on an embedded Flash chip, and further relates to electronic equipment and a computer-readable storage medium.
Background
With the rapid development of computer technology, fast and efficient computer processors have become the focus of research in this field. Among them, on-chip eFlash (i.e., embedded Flash) is being widely used as a nonvolatile memory module for instructions and data in the field of low-power-consumption embedded chip SoC design. The processor is mainly responsible for tasks such as control, operating system platform and general signal processing, and the eFlash is used for storing instructions and data. The processor needs to access the eFlash to obtain the required instructions and data to complete the corresponding task processing operation. The processor typically accesses instructions more frequently. Compared with the performance of a processor which can be improved by instruction level parallelism, superscalar design and a large number of registers, the performance of the eFlash can be improved only by a few methods such as process improvement. Therefore, with the improvement of the performance of the processor, the on-chip instruction fetching speed of the eFlash gradually becomes the bottleneck of the performance of the SoC system, and the overall performance of the SoC is directly influenced and limited by the speed of the instruction fetching speed of the eFlash.
The industry typically employs an acceleration scheme design based on cache principle to narrow the performance gap between the processor and the memory. The buffer memory has the advantages that: under the condition of high hit rate, the cache can obviously improve the reading performance of the eFlash, reduce the access of the eFlash and reduce the power consumption of the system, but the cache also has the obvious defects that: 1. the cache capacity is large, the cost is high, and large hardware overhead can be brought; 2. in an application scenario where the cycle exceeds the cache capacity and the cycle is less, the cache hit rate is low and the acceleration performance is not obvious.
With the increasing diversification of application scenarios such as AIoT, optimal performance acceleration cannot be obtained by only relying on simple prefetching or caching. Therefore, the research on how to improve the on-chip instruction reading speed of the eFlash has important significance on improving the overall performance of the SoC system.
Disclosure of Invention
Therefore, the embodiment of the invention provides a hardware acceleration method for an embedded Flash on-chip instruction reading, which aims to solve the problem that the acceleration performance aiming at the instruction reading speed on the embedded Flash is not obvious in the prior art and the actual requirements of the current user cannot be met.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a hardware acceleration method for a read instruction on an embedded Flash chip, including: classifying the instructions at the front end of a processor assembly line by utilizing a preset tightly-coupled on-chip instruction reading hardware accelerator architecture based on instruction predecoding and an instruction information classifier; accelerating hardware of the first type of instruction based on preset instruction bit width expansion and a hardware prefetching mechanism; accelerating the hardware of the second type of instruction based on a preset high-precision hybrid branch predictor mechanism and a set associative cache mechanism; constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on key components of the embedded Flash on-chip system; and adjusting the instruction reading speed on the embedded Flash chip in real time at a hardware level through a preset main frequency hardware detector and a hardware self-adaptive dynamic switching strategy.
Further, the method for accelerating the hardware of the read instruction on the embedded Flash chip further comprises the following steps: triggering a preset main frequency hardware detector; and establishing an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip, and accelerating the read instruction operation on the embedded Flash chip based on the acceleration strategy.
Further, the making of an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip specifically includes: judging the frequency type of a master frequency clock of the system on the embedded Flash chip according to the master frequency hardware detector and the time sequence of the read operation on the embedded Flash chip; if the master frequency clock of the system on the embedded Flash chip is low frequency, reading the instruction on the embedded Flash chip by a single clock period through a read operation acceleration strategy of an embedded Flash controller; if the master frequency clock of the system on the embedded Flash chip is high frequency, determining the type of the instruction executed on the embedded Flash chip through the zone bit of the instruction classifier; and if the instruction type is a sequential instruction, reading an acceleration strategy by adopting the established sequential instruction, and if the instruction type is a jump instruction, reading the acceleration strategy by adopting the established jump instruction sheet.
Further, the first type of instruction is a sequential instruction.
Further, the second type instruction is a jump instruction.
In a second aspect, an embodiment of the present invention further provides an embedded Flash chip read instruction hardware acceleration apparatus, including: the instruction information classification unit is used for classifying the instructions at the front end of the processor pipeline based on instruction predecoding and an instruction information classifier by utilizing a preset tightly-coupled on-chip read instruction hardware accelerator architecture; the first hardware acceleration unit is used for accelerating the hardware of the first type of instruction based on preset instruction bit width expansion and a hardware prefetching mechanism; the second hardware acceleration and closed-loop feedback unit is used for accelerating the hardware of the second type of instruction based on a preset high-precision hybrid branch predictor mechanism and a set-associative cache mechanism; constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on key components of the embedded Flash on-chip system; and the adjusting unit is used for adjusting the instruction reading speed on the embedded Flash chip in real time through a preset main frequency hardware detector and a hardware self-adaptive dynamic switching strategy.
Furthermore, the hardware acceleration device for reading instructions on the embedded Flash chip further comprises: the main frequency hardware detector triggering unit is used for triggering the preset main frequency hardware detector; and the acceleration strategy determining unit is used for making an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip and accelerating the read instruction operation on the embedded Flash chip based on the acceleration strategy.
Further, the acceleration policy determining unit is specifically configured to: if the master frequency clock of the system on the embedded Flash chip is low frequency, reading the instruction on the embedded Flash chip by a single clock period through a read operation acceleration strategy of an embedded Flash controller; if the master frequency clock of the system on the embedded Flash chip is high frequency, determining the type of the instruction executed on the embedded Flash chip through the zone bit of the instruction classifier; and if the instruction type is a sequential instruction, reading an acceleration strategy by adopting the established sequential instruction, and if the instruction type is a jump instruction, reading the acceleration strategy by adopting the established jump instruction sheet.
Further, the first type of instruction is a sequential instruction.
Further, the second type instruction is a jump instruction.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: a processor and a memory; the memory is used for storing a program for executing the hardware acceleration method of the read instruction on the embedded Flash chip, and the electronic equipment is powered on and executes any one of the hardware acceleration methods for executing the read instruction on the embedded Flash chip after running the program for executing the hardware acceleration method of the read instruction on the embedded Flash chip through the processor.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium contains one or more program instructions, and the one or more program instructions are configured to execute any one of the above hardware acceleration methods for executing an embedded Flash read-on-chip instruction.
By adopting the hardware acceleration method for the embedded Flash on-chip reading instruction, the system on the chip can be ensured to obtain the optimal instruction reading speed on the embedded Flash on-chip at both high frequency and low frequency, and the system performance improvement under different scene requirements in embedded application is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
FIG. 1 is a flowchart of a hardware acceleration method for a read instruction on an embedded Flash chip according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hardware acceleration apparatus for reading instructions on an embedded Flash chip according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present invention;
fig. 4 is a flowchart of an acceleration strategy selected and formulated in a hardware acceleration method for a read instruction on an embedded Flash chip according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following describes an embodiment of the embedded Flash chip read instruction hardware acceleration method in detail based on the invention. As shown in fig. 1, which is a flowchart of a hardware acceleration method for a read instruction on an embedded Flash chip according to an embodiment of the present invention, a specific implementation process includes the following steps:
step S101: and classifying the instructions at the front end of the processor pipeline by utilizing a preset tightly-coupled on-chip instruction reading hardware accelerator architecture based on instruction predecoding and an instruction information classifier.
In the embodiment of the invention, the instruction at the front end of the processor pipeline can be classified by using a preset tightly-coupled on-chip read instruction hardware accelerator architecture, namely, by using a preset tightly-coupled Flash on-chip read instruction hardware accelerator architecture and based on instruction predecoding and an instruction information classifier.
Step S102: and accelerating the hardware of the first type of instruction based on preset instruction bit width expansion and a hardware prefetching mechanism.
After the instructions at the front end of the processor pipeline are classified in step S101, the instructions at the front end of the processor pipeline can be classified based on the instruction predecoding and the instruction information classifier by using a preset tightly coupled on-chip instruction reading hardware accelerator architecture in this step. Wherein the first type of instruction may be a sequential instruction.
For the first type of instruction (i.e., sequential instruction): in the specific implementation process, the hardware accelerator splices and expands a plurality of embedded Flash slices through bit width to realize single access to the embedded Flash slices and read out a plurality of instructions. The embedded Flash read operation of the sequential instruction can reach 16 times of speed-up ratio at most. Furthermore, the hardware can prefetch the instruction in advance when the fetch of the CPU processor stops by adding a proper Buffer through a hardware prefetching strategy, so that the delay caused by the Flash sequential instruction on a single reading chip is hidden. It should be noted that the embedded Flash slices may generally refer to 24 embedded Flash slices, and are not specifically limited herein.
Step S103: accelerating the hardware of the second type of instruction based on a preset high-precision hybrid branch predictor mechanism and a set associative cache mechanism; and constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on the key components of the embedded Flash on-chip system.
After hardware acceleration of the first type of instruction in step S102, hardware acceleration of the second type of instruction may be performed in this step based on a preset high-precision hybrid branch predictor mechanism, a set associative cache mechanism. In addition, a parameterized ESL system model and a hardware circuit generator can be constructed based on HLS, performance analysis is carried out on key components of the embedded Flash chip system, and closed-loop feedback is further achieved. Wherein the second type of instruction may be referred to as a jump instruction.
For the second type of instruction (i.e., a jump instruction): in the specific implementation process, the proportion of the area of the CPU processor in the embedded Flash system-on-chip is smaller, so that the appropriate area can be properly increased to greatly improve the performance of the system-on-chip. In the embodiment of the invention, the branch prediction accuracy of about 99.1 percent can be realized by a high-precision mixed branch predictor mechanism formed by a preset TAGE branch predictor, a Loop predictor and the like. In step S101, the first stage of the tightly coupled read instruction hardware accelerator architecture reduces instruction misses caused by branch mispredictions.
And adopting a branch cache strategy for the jump instruction decoded by the preset instruction information analyzer, and continuing to perform next prefetching operation by reducing access delay caused by prefetching failure and correcting a prefetching address, thereby realizing acceleration of on-chip reading instruction operation of the embedded Flash.
In addition, in the process of constructing a parameterized ESL system model and a parameterized hardware circuit generator based on HLS and analyzing the performance of key components of the embedded Flash chip, the accuracy and the simulation speed of the hardware acceleration device for reading instructions on the embedded Flash chip are the key points of the performance analysis of the ESL system model.
In the embodiment of the present invention, the constructing a parameterized electronic system Level design ESL system model and a hardware circuit generator based on high-Level integrated hls (high Level synthesis) may specifically include: carrying out parameterization modeling on ESL of the processor through a high-level language, and converting the ESL into RTL codes through an HLS hardware generator; further, RTL code of the high-performance core is used as an ESL system model for CPU processor performance analysis. A special performance monitoring unit realized by a plurality of hardware counters can be added outside the high-performance core to realize the performance analysis of the processor in the micro-architecture level; the method comprises the steps of counting events of a series of behaviors of an instruction fetching component, a pre-decoding component, a branch predictor component, an instruction decoding component, an instruction distribution component and the like of a processor through a preset performance monitoring unit, outputting a counting result to a key data summarizing component, and realizing closed-loop feedback of performance analysis of a key component or a data path.
Step S104: and adjusting the instruction reading speed on the embedded Flash chip through a preset main frequency hardware detector and a hardware self-adaptive dynamic switching strategy.
After accelerating the hardware of the second type of instruction in step S103 and performing performance analysis on the key components of the system on the embedded Flash chip, in step S104, a hardware-level dynamic real-time adjustment may be performed on the instruction reading speed on the embedded Flash chip by using a preset dominant frequency hardware detector and a hardware adaptive dynamic switching policy.
Fig. 4 is a flowchart of an acceleration strategy selected and formulated in a hardware acceleration method for a read instruction on an embedded Flash chip according to an embodiment of the present invention.
In the embodiment of the invention, the method also comprises the steps of triggering the preset main frequency hardware detector, making an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip, and accelerating the read instruction operation on the embedded Flash chip based on the acceleration strategy. Wherein, the establishing of the acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip may include: judging the frequency type of a master frequency clock of the system on the embedded Flash chip according to the master frequency hardware detector and the time sequence of the read operation on the embedded Flash chip; if the master frequency clock of the system on the embedded Flash chip is low frequency, reading the instruction on the embedded Flash chip by a single clock period through a read operation acceleration strategy of an embedded Flash controller; if the master frequency clock of the system on the embedded Flash chip is high frequency, determining the type of the instruction executed on the embedded Flash chip through the zone bit of the instruction classifier; and if the instruction type is a sequential instruction, reading an acceleration strategy by adopting the established sequential instruction, and if the instruction type is a jump instruction or other instructions, reading the acceleration strategy by adopting the established jump instruction or other instruction sheets.
Specifically, after the system on the Flash chip is initialized, a main frequency hardware detector is triggered and started, and a proper acceleration strategy is formulated according to the main frequency and the read operation time sequence of the embedded Flash chip. If the main frequency clock of the system on the Flash chip is low frequency, after a bus of an interconnection network on the Flash chip sends a read address and a read request through an embedded Flash controller read operation acceleration strategy, an AE control signal of the embedded Flash is pulled up by the rising edge of the next clock, and AE is pulled down by the falling edge of the clock, so that the embedded Flash chip completes the read operation in one clock cycle, and the reading of the on-chip instruction in a single clock cycle is realized. And if the main frequency clock of the system on the Flash chip is high frequency, determining the type of the instruction executed on the Flash chip through the zone bit of the instruction classifier. If the instruction type is a sequential instruction, reading an acceleration strategy on a sequential instruction sheet; and if the instruction type is a jump instruction, adopting a jump instruction sheet to read an acceleration strategy.
By adopting the hardware acceleration method for the embedded Flash on-chip reading instruction, the system on the chip can be ensured to obtain the optimal instruction reading speed on the embedded Flash on-chip at both high frequency and low frequency, and the system performance improvement under different scene requirements in embedded application is realized. Through hardware self-adaptive dynamic switching, the optimal instruction reading speed on the embedded Flash chip is obtained, and the system performance improvement under different scene requirements in embedded application is realized.
Corresponding to the method for accelerating the hardware of the read instruction on the embedded Flash chip, the invention also provides a device for accelerating the hardware of the read instruction on the embedded Flash chip. Because the embodiment of the device is similar to the embodiment of the method, the description is relatively simple, and for the relevant points, reference may be made to the description of the embodiment of the method, and the embodiment of the hardware acceleration device for the read instruction on the embedded Flash chip described below is only illustrative. Fig. 2 is a schematic diagram of a hardware acceleration apparatus for reading instructions on an embedded Flash chip according to an embodiment of the present invention.
The invention relates to an embedded Flash on-chip read instruction hardware accelerating device, which comprises the following parts:
the instruction information classification unit 201 is configured to classify an instruction at the front end of a processor pipeline based on an instruction predecoding and an instruction information classifier by using a preset tightly coupled on-chip read instruction hardware accelerator architecture.
The first hardware acceleration unit 202 is configured to accelerate hardware of the first type of instruction based on a preset instruction bit width extension and a hardware prefetching mechanism.
After the instruction information classification unit 201 classifies the instruction at the front end of the processor pipeline, the first hardware acceleration unit 202 may utilize a preset tightly coupled on-chip read instruction hardware accelerator architecture to classify the instruction at the front end of the processor pipeline based on instruction predecoding and an instruction information classifier. Wherein the first type of instruction may be a sequential instruction.
For the first type of instruction (i.e., sequential instruction): in the specific implementation process, the hardware accelerator splices and expands a plurality of embedded Flash slices through bit width to realize single access to the embedded Flash slices and read out a plurality of instructions. The embedded Flash read operation of the sequential instruction can reach 16 times of speed-up ratio at most. Furthermore, the hardware can prefetch the instruction in advance when the fetch of the CPU processor stops by adding a proper Buffer through a hardware prefetching strategy, so that the delay caused by the Flash sequential instruction on a single reading chip is hidden. It should be noted that the embedded Flash slices may generally refer to 24 embedded Flash slices, and are not specifically limited herein.
A second hardware acceleration and closed-loop feedback unit 203, configured to accelerate hardware of a second type of instruction based on a preset high-precision hybrid branch predictor mechanism and a set-associative cache mechanism; and constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on the key components of the embedded Flash on-chip system.
After the hardware acceleration of the first type of instruction in the first hardware acceleration unit 202, the hardware acceleration of the second type of instruction may be performed in the second hardware acceleration and closed-loop feedback unit 203 based on a predetermined high-precision hybrid branch predictor mechanism, a set-associative cache mechanism. In addition, a parameterized ESL system model and a hardware circuit generator can be constructed based on HLS, performance analysis is carried out on key components of the embedded Flash chip system, and closed-loop feedback is further achieved. The second type of instruction may refer to a jump instruction, or may refer to other instructions such as a non-sequential instruction.
And the second hardware acceleration and closed-loop feedback unit 203 is used for adjusting the instruction reading speed on the embedded Flash chip through a preset dominant frequency hardware detector and a hardware self-adaptive dynamic switching strategy.
After accelerating the hardware of the second type of instruction and performing performance analysis on the key components of the system on the embedded Flash chip in step S103, the instruction reading speed on the embedded Flash chip can be adjusted in the adjusting unit 204 through a preset dominant frequency hardware detector and a hardware adaptive dynamic switching strategy.
In the embodiment of the invention, the method also comprises the steps of triggering the preset main frequency hardware detector, making an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip, and accelerating the read instruction operation on the embedded Flash chip based on the acceleration strategy. Wherein, the establishing of the acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip may include: judging the frequency type of a master frequency clock of the system on the embedded Flash chip according to the master frequency hardware detector and the time sequence of the read operation on the embedded Flash chip; if the master frequency clock of the system on the embedded Flash chip is low frequency, reading the instruction on the embedded Flash chip by a single clock period through a read operation acceleration strategy of an embedded Flash controller; if the master frequency clock of the system on the embedded Flash chip is high frequency, determining the type of the instruction executed on the embedded Flash chip through the zone bit of the instruction classifier; and if the instruction type is a sequential instruction, reading an acceleration strategy by adopting the established sequential instruction, and if the instruction type is a jump instruction, reading the acceleration strategy by adopting the established jump instruction sheet.
By adopting the hardware accelerating device for the embedded Flash on-chip reading instruction, the system on chip can be ensured to obtain the optimal instruction reading speed on the embedded Flash on-chip at high and low frequencies, and the system performance improvement under different scene requirements in embedded application is realized.
Corresponding to the method for accelerating the hardware of the read instruction on the embedded Flash chip, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is relatively simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 3 is a schematic view of an electronic device according to an embodiment of the present invention.
The electronic device specifically includes: a processor 301 and a memory 302; the memory 302 is configured to run one or more program instructions, and is configured to store a processor-related program, and after the electronic device is powered on and runs the related program through the processor 301, the hardware acceleration method for the read instruction on the embedded Flash chip is executed. The electronic device of the present invention may be a program execution device in a computer.
Corresponding to the hardware acceleration method of the read instruction on the embedded Flash chip, the invention also provides a computer storage medium. Since the embodiment of the computer storage medium is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the computer storage medium described below is only schematic.
The computer storage medium contains one or more program instructions, and the one or more program instructions are used for executing the embedded Flash on-chip reading instruction hardware acceleration method.
In an embodiment of the invention, the processor or processor module may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic device.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be embodied directly in the hardware-dependent processor pipeline, or in a combination of hardware and software modules within the processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synclink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (10)
1. A hardware acceleration method for reading instructions on an embedded Flash chip is characterized by comprising the following steps:
classifying the instructions at the front end of a processor assembly line by utilizing a preset tightly-coupled on-chip instruction reading hardware accelerator architecture based on instruction predecoding and an instruction information classifier;
accelerating hardware of the first type of instruction based on preset instruction bit width expansion and a hardware prefetching mechanism;
accelerating the hardware of the second type of instruction based on a preset high-precision hybrid branch predictor mechanism and a set associative cache mechanism; constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on key components of the embedded Flash on-chip system;
and adjusting the instruction reading speed on the embedded Flash chip in real time at a hardware level through a preset main frequency hardware detector and a hardware self-adaptive dynamic switching strategy.
2. The embedded Flash on-chip read instruction hardware acceleration method of claim 1, further comprising:
triggering a preset main frequency hardware detector;
and establishing an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip, and accelerating the read instruction operation on the embedded Flash chip based on the acceleration strategy.
3. The method for accelerating the hardware of the read instruction on the embedded Flash chip according to claim 2, wherein the step of formulating an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip specifically comprises the steps of:
judging the frequency type of a master frequency clock of the system on the embedded Flash chip according to the master frequency hardware detector and the time sequence of the read operation on the embedded Flash chip;
if the master frequency clock of the system on the embedded Flash chip is low frequency, reading the instruction on the embedded Flash chip by a single clock period through a read operation acceleration strategy of an embedded Flash controller;
if the master frequency clock of the system on the embedded Flash chip is high frequency, determining the type of the instruction executed on the embedded Flash chip through the zone bit of the instruction classifier; and if the instruction type is a sequential instruction, reading an acceleration strategy by adopting the established sequential instruction, and if the instruction type is a jump instruction, reading the acceleration strategy by adopting the established jump instruction sheet.
4. The embedded Flash on-chip instruction hardware acceleration method of claim 1, characterized in that, the first type instruction is a sequential instruction.
5. The embedded Flash on-chip read instruction hardware acceleration method of claim 1, characterized in that the second type instruction is a jump instruction.
6. An embedded Flash on-chip read instruction hardware accelerator, comprising:
the instruction information classification unit is used for classifying the instructions at the front end of the processor pipeline based on instruction predecoding and an instruction information classifier by utilizing a preset tightly-coupled on-chip read instruction hardware accelerator architecture;
the first hardware acceleration unit is used for accelerating the hardware of the first type of instruction based on preset instruction bit width expansion and a hardware prefetching mechanism;
the second hardware acceleration and closed-loop feedback unit is used for accelerating the hardware of the second type of instruction based on a preset high-precision hybrid branch predictor mechanism and a set-associative cache mechanism; constructing a parameterized ESL system model and a hardware circuit generator based on HLS, and performing performance analysis on key components of the embedded Flash on-chip system;
and the adjusting unit is used for adjusting the instruction reading speed on the embedded Flash chip in real time at a hardware level through a preset main frequency hardware detector and a hardware self-adaptive dynamic switching strategy.
7. The embedded Flash on-chip read instruction hardware acceleration device of claim 6, further comprising:
the main frequency hardware detector triggering unit is used for triggering the preset main frequency hardware detector;
and the acceleration strategy determining unit is used for making an acceleration strategy according to the main frequency hardware detector and the time sequence of the read operation on the embedded Flash chip and accelerating the read instruction operation on the embedded Flash chip based on the acceleration strategy.
8. The embedded Flash on-chip instruction-reading hardware acceleration device of claim 7, wherein the acceleration policy determination unit is specifically configured to:
if the master frequency clock of the system on the embedded Flash chip is low frequency, reading the instruction on the embedded Flash chip by a single clock period through a read operation acceleration strategy of an embedded Flash controller;
if the master frequency clock of the system on the embedded Flash chip is high frequency, determining the type of the instruction executed on the embedded Flash chip through the zone bit of the instruction classifier; if the instruction type is a sequential instruction, adopting a set sequential instruction to read an acceleration strategy; and if the instruction type is a jump instruction, adopting a set jump instruction sheet to read an acceleration strategy.
9. An electronic device, comprising:
a processor; and the number of the first and second groups,
the memory is used for storing a program of the hardware acceleration method of the read instruction on the embedded Flash chip, and after the electronic equipment is electrified and runs the program of the hardware acceleration method of the read instruction on the embedded Flash chip through the processor, the hardware acceleration method of the read instruction on the embedded Flash chip as claimed in any one of the claims 1-5 is executed.
10. A computer-readable storage medium containing one or more program instructions for execution by a server of the embedded Flash read-on-chip instruction hardware acceleration method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911381757.9A CN111190644A (en) | 2019-12-27 | 2019-12-27 | Embedded Flash on-chip read instruction hardware acceleration method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911381757.9A CN111190644A (en) | 2019-12-27 | 2019-12-27 | Embedded Flash on-chip read instruction hardware acceleration method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111190644A true CN111190644A (en) | 2020-05-22 |
Family
ID=70707871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911381757.9A Pending CN111190644A (en) | 2019-12-27 | 2019-12-27 | Embedded Flash on-chip read instruction hardware acceleration method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111190644A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112540902A (en) * | 2020-12-03 | 2021-03-23 | 山东云海国创云计算装备产业创新中心有限公司 | Method, device and equipment for testing performance of system on chip and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1991792A (en) * | 2005-09-30 | 2007-07-04 | 英特尔公司 | Instruction-assisted cache management for efficient use of cache and memory |
CN101266577A (en) * | 2008-03-27 | 2008-09-17 | 上海交通大学 | Programmable on-chip memorizer interface NOR flash memory reading quickening control method |
US20110066837A1 (en) * | 2000-01-06 | 2011-03-17 | Super Talent Electronics Inc. | Single-Chip Flash Device with Boot Code Transfer Capability |
-
2019
- 2019-12-27 CN CN201911381757.9A patent/CN111190644A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110066837A1 (en) * | 2000-01-06 | 2011-03-17 | Super Talent Electronics Inc. | Single-Chip Flash Device with Boot Code Transfer Capability |
CN1991792A (en) * | 2005-09-30 | 2007-07-04 | 英特尔公司 | Instruction-assisted cache management for efficient use of cache and memory |
CN101266577A (en) * | 2008-03-27 | 2008-09-17 | 上海交通大学 | Programmable on-chip memorizer interface NOR flash memory reading quickening control method |
Non-Patent Citations (2)
Title |
---|
刘桂华: "《基于FPGA的现代数字系统设计》", 30 September 2012 * |
蒋进松 等: "基于预取和缓存原理的片上Flash加速控制器设计", 《计算机工程与科学》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112540902A (en) * | 2020-12-03 | 2021-03-23 | 山东云海国创云计算装备产业创新中心有限公司 | Method, device and equipment for testing performance of system on chip and readable storage medium |
CN112540902B (en) * | 2020-12-03 | 2023-03-14 | 山东云海国创云计算装备产业创新中心有限公司 | Method, device and equipment for testing performance of system on chip and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101594090B1 (en) | Processors, methods, and systems to relax synchronization of accesses to shared memory | |
US10949260B2 (en) | Execution time prediction for energy-efficient computer systems | |
US8612944B2 (en) | Code evaluation for in-order processing | |
US10303608B2 (en) | Intelligent data prefetching using address delta prediction | |
US11636122B2 (en) | Method and apparatus for data mining from core traces | |
US20130185515A1 (en) | Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher | |
EP3335109A1 (en) | Determining prefetch instructions based on instruction encoding | |
US20160147516A1 (en) | Execution of complex recursive algorithms | |
JP2010026716A (en) | Cache memory control circuit and processor | |
CN102722451B (en) | Device for accessing cache by predicting physical address | |
CN111190644A (en) | Embedded Flash on-chip read instruction hardware acceleration method and device | |
US9158545B2 (en) | Looking ahead bytecode stream to generate and update prediction information in branch target buffer for branching from the end of preceding bytecode handler to the beginning of current bytecode handler | |
Haque et al. | Susesim: a fast simulation strategy to find optimal l1 cache configuration for embedded systems | |
US10372902B2 (en) | Control flow integrity | |
CN114661350A (en) | Apparatus, system, and method for concurrently storing multiple PMON counts in a single register | |
CN102541738B (en) | Method for accelerating soft error resistance test of multi-core CPUs (central processing units) | |
CN107769987B (en) | Message forwarding performance evaluation method and device | |
Haque et al. | Dew: A fast level 1 cache simulation approach for embedded processors with fifo replacement policy | |
US20050050534A1 (en) | Methods and apparatus to pre-execute instructions on a single thread | |
CN111158754A (en) | Processor branch prediction method and device for balancing prediction precision and time delay | |
CN118227446B (en) | Cache performance evaluation method and device, electronic equipment and readable storage medium | |
US20230185695A1 (en) | Processor trace with suppression of periodic timing packets for low density trace sections | |
US20050223203A1 (en) | Segmented branch predictor | |
CN112579169B (en) | Method and device for generating processor trace stream | |
US11860762B2 (en) | Semiconductor device, control flow inspection method, non-transitory computer readable medium, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200522 |
|
RJ01 | Rejection of invention patent application after publication |