[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118605909A - A hardware development system based on RISC-V - Google Patents

A hardware development system based on RISC-V Download PDF

Info

Publication number
CN118605909A
CN118605909A CN202410770378.3A CN202410770378A CN118605909A CN 118605909 A CN118605909 A CN 118605909A CN 202410770378 A CN202410770378 A CN 202410770378A CN 118605909 A CN118605909 A CN 118605909A
Authority
CN
China
Prior art keywords
module
instruction
processor
function
risc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410770378.3A
Other languages
Chinese (zh)
Inventor
倪家哲
黄龙涛
钟世达
张沛昌
袁涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202410770378.3A priority Critical patent/CN118605909A/en
Publication of CN118605909A publication Critical patent/CN118605909A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明提供了一种基于RISC‑V的硬件开发系统,包括指令追踪模块、波形切片模块、死锁检测模块、仿存追踪模块、函数调用追踪模块、外设访问追踪模块、指令模拟器、差分调试对比模块、综合调试模块及调用模块;其中波形切片模块每次记录的波形周期可以自行配置,大大减小生成的波形文件所占空间;差分调试对比模块可以自适应处理器每条指令的执行周期,即单周期或采用多级流水线技术的处理器都可以使用此模块。通过使用本发明的基于RISC‑V的硬件开发系统进行RISC‑V处理器设计,能够大大提高处理器的开发调试效率。

The present invention provides a RISC-V-based hardware development system, including an instruction tracking module, a waveform slicing module, a deadlock detection module, an emulated memory tracking module, a function call tracking module, a peripheral access tracking module, an instruction simulator, a differential debugging comparison module, a comprehensive debugging module and a calling module; wherein the waveform cycle recorded by the waveform slicing module each time can be configured by itself, greatly reducing the space occupied by the generated waveform file; the differential debugging comparison module can adapt to the execution cycle of each instruction of the processor, that is, a single-cycle processor or a processor using multi-stage pipeline technology can use this module. By using the RISC-V-based hardware development system of the present invention to design a RISC-V processor, the development and debugging efficiency of the processor can be greatly improved.

Description

一种基于RISC-V的硬件开发系统A hardware development system based on RISC-V

技术领域Technical Field

本发明涉及硬件开发系统,尤其是指一种基于RISC-V的硬件开发系统。The present invention relates to a hardware development system, and in particular to a hardware development system based on RISC-V.

背景技术Background Art

随着处理器技术的不断发展,RISC-V指令集作为一种开放、可拓展的指令集架构,逐渐受到业界的关注。在RISC-V处理器设计中,如何灵活高效地进行开发调试以及快速准确地定位到出错的指令是重要的方向之一。With the continuous development of processor technology, the RISC-V instruction set, as an open and scalable instruction set architecture, has gradually attracted the attention of the industry. In the design of RISC-V processors, how to flexibly and efficiently develop and debug and quickly and accurately locate the erroneous instructions is one of the important directions.

和一般的ASIC设计不同,处理器的仿真是执行指令的过程。处理器执行一次简单的测试程序往往会有上万甚至几十万条指令,生成的波形能达到几十到几百GB,而执行的指令出错常常在几十甚至几百个周期后才会表现出来,这样就不便于寻找出错的位置。同时,处理器仿真时无法完整记录执行过的指令,并且在寻找出错的指令时可能需要同时观察处理器核内部几十个信号的波形,这样的调试方法效率十分低下。Unlike general ASIC design, processor simulation is the process of executing instructions. When a processor executes a simple test program, there are often tens of thousands or even hundreds of thousands of instructions, and the generated waveform can reach tens to hundreds of GB. Errors in executed instructions often appear after dozens or even hundreds of cycles, which makes it difficult to find the location of the error. At the same time, the processor simulation cannot fully record the executed instructions, and when looking for the erroneous instruction, it may be necessary to observe the waveforms of dozens of signals inside the processor core at the same time. This debugging method is very inefficient.

发明内容Summary of the invention

本发明所要解决的技术问题是:提供一种基于RISC-V的硬件开发系统,旨在提高硬件开发效率。The technical problem to be solved by the present invention is to provide a RISC-V-based hardware development system to improve the efficiency of hardware development.

为了解决上述技术问题,本发明采用的技术方案为:一种基于RISC-V的硬件开发系统,包括:指令追踪模块、波形切片模块、死锁检测模块、仿存追踪模块、函数调用追踪模块、外设访问追踪模块、指令模拟器、差分调试对比模块、综合调试模块及调用模块;In order to solve the above technical problems, the technical solution adopted by the present invention is: a RISC-V-based hardware development system, comprising: an instruction tracing module, a waveform slicing module, a deadlock detection module, an emulated memory tracing module, a function call tracing module, a peripheral access tracing module, an instruction simulator, a differential debugging comparison module, a comprehensive debugging module and a calling module;

在处理器接入基于RISC-V的硬件开发系统后,将前进一个周期封装为single_cycle()函数,调用模块在single_cycle()函数中调用一次波形切片模块进行波形切片;After the processor is connected to the RISC-V-based hardware development system, the forward cycle is encapsulated as a single_cycle() function, and the calling module calls the waveform slicing module once in the single_cycle() function to perform waveform slicing;

在处理器完成复位并初始化信息之后,进入综合调试模块的主循环,综合调试模块发出执行命令调用cpu_exec()函数让处理器执行指令;After the processor completes the reset and initialization information, it enters the main loop of the integrated debugging module. The integrated debugging module issues an execution command to call the cpu_exec() function to let the processor execute instructions;

调用模块更新程序计数器,调用函数追踪模块判断当前程序计数器的地址是否为调用或返回函数;The calling module updates the program counter, and the calling function tracking module determines whether the current address of the program counter is a call or return function;

调用模块调用一次single_cycle()函数表示处理器前进一个周期,然后调用get_cpu_signals()函数来获取处理器中的相关信号;The calling module calls the single_cycle() function once to indicate that the processor advances one cycle, and then calls the get_cpu_signals() function to obtain the relevant signals in the processor;

调用模块在single_cycle()函数后调用死锁检测模块来检查处理器是否进入死循环;The calling module calls the deadlock detection module after the single_cycle() function to check whether the processor enters an infinite loop;

处理器在执行过程中如果有访问内存或访问外设的行为,调用模块通过dpic机制调用访存追踪模块和外设访问追踪模块通知开发平台,并将相关信息写入log文件中;If the processor accesses memory or peripherals during execution, the calling module calls the memory access tracking module and the peripheral access tracking module through the DPIC mechanism to notify the development platform and write the relevant information into the log file;

在一条指令执行完毕后,调用模块调用指令追踪模块将这一条执行过的指令信息记录下来;After an instruction is executed, the calling module calls the instruction tracking module to record the information of the executed instruction;

调用模块调用差分调试对比模块验证处理器执行是否正确;在差分调试对比模块中会通知指令模拟器执行一条指令,然后获取指令模拟器的寄存器堆状态与处理器的寄存器堆状态进行对比,如果状态相同则重新更新程序计数器继续执行下一条指令,如果状态不同则退出仿真并输出相关错误信息。The calling module calls the differential debugging comparison module to verify whether the processor execution is correct; in the differential debugging comparison module, the instruction simulator is notified to execute an instruction, and then the register stack state of the instruction simulator is obtained and compared with the register stack state of the processor. If the states are the same, the program counter is updated to continue executing the next instruction. If the states are different, the simulation is exited and relevant error information is output.

进一步的,所述处理器的写回级中设有交付模块,具体用于只有当上一个周期的指令和当前周期的指令不同时才进行交付。Furthermore, a commit module is provided in the write-back stage of the processor, which is specifically used to commit only when the instruction of the previous cycle is different from the instruction of the current cycle.

进一步的,所述指令追踪模块具体用于,将程序计数器地址和指令写入logbuf字符串中;调用llvm反汇编,输入指令信息,将反汇编后的信息输入logbuf中,完成了一次执行指令的记录。Furthermore, the instruction tracing module is specifically used to write the program counter address and instructions into the logbuf string; call llvm disassembly, input instruction information, and input the disassembled information into logbuf, thereby completing the record of an executed instruction.

进一步的,所述波形切片模块,具体用于记录最近的N个周期的波形,如果在这期间处理器执行正确,则丢弃这N个周期的波形,重新开始记录下一个N个周期的波形。Furthermore, the waveform slicing module is specifically used to record the waveforms of the most recent N cycles. If the processor executes correctly during this period, the waveforms of these N cycles are discarded and the waveforms of the next N cycles are recorded again.

进一步的,所述死锁检测模块,具体用于记录当前交付指令的运行周期,如果当前指令执行超过预设的周期,则判断处理器进入了死锁,硬件开发系统退出仿真并输出相关的调试信息。Furthermore, the deadlock detection module is specifically used to record the running cycle of the current delivered instruction. If the current instruction execution exceeds a preset cycle, it is determined that the processor has entered a deadlock, and the hardware development system exits the simulation and outputs relevant debugging information.

进一步的,所述访存追踪模块,具体用于根据访存指令的类型、访存地址、访存数据输出相关的信息到log文件,并记录下访存时的pc地址。Furthermore, the memory access tracking module is specifically used to output relevant information to a log file according to the type of memory access instruction, memory access address, and memory access data, and record the PC address during memory access.

进一步的,所述函数调用追踪模块,具体用于根据当前的指令读取相应的程序elf文件,反汇编输出当前的函数调用信息到log文件。Furthermore, the function call tracking module is specifically used to read the corresponding program elf file according to the current instruction, disassemble and output the current function call information to the log file.

进一步的,所述外设访问追踪模块,具体用于根据当前访存的地址判断是否属于外设访问,并输出外设访问的地址、外设名称、访问数据信息。Furthermore, the peripheral access tracking module is specifically used to determine whether it is a peripheral access based on the address of the current memory access, and output the address of the peripheral access, the peripheral name, and the access data information.

进一步的,所述差分调试对比模块,具体用于根据每条指令执行后处理器和指令模拟器中的寄存器状态,判断当前指令是否执行正确,并将判断结果返回硬件开发系统。Furthermore, the differential debugging comparison module is specifically used to determine whether the current instruction is executed correctly according to the register status in the processor and the instruction simulator after each instruction is executed, and return the determination result to the hardware development system.

进一步的,所述综合调试模块,具体用于处理器的单步调试、打印寄存器、设置监视点、计算表达式、扫描内存及退出仿真。Furthermore, the comprehensive debugging module is specifically used for single-step debugging of the processor, printing registers, setting monitoring points, calculating expressions, scanning memory and exiting simulation.

本发明的有益效果在于:采用基于RISC-V的硬件开发系统,能够完整记录处理器的运行踪迹,自动检测死锁情况和减小波形文件的大小,自动定位指令出错的位置,采用多种调试方法进行灵活开发,大大提高了处理器设计的效率。The beneficial effects of the present invention are: using a RISC-V-based hardware development system, it is possible to completely record the operation traces of the processor, automatically detect deadlock situations and reduce the size of waveform files, automatically locate the location of instruction errors, and use a variety of debugging methods for flexible development, thereby greatly improving the efficiency of processor design.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的机构获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on the mechanisms shown in these drawings without paying creative work.

图1为本发明实施例的基于RISC-V的硬件开发系统整体架构图;FIG1 is an overall architecture diagram of a RISC-V-based hardware development system according to an embodiment of the present invention;

图2为本发明实施例的基于RISC-V的硬件开发系统执行流程图;FIG2 is a flowchart of an execution of a RISC-V-based hardware development system according to an embodiment of the present invention;

图3为本发明实施例的指令模拟器整体架构图;FIG3 is a diagram showing the overall architecture of an instruction simulator according to an embodiment of the present invention;

图4为本发明实施例的初始化信息流程图;FIG4 is a flow chart of initialization information according to an embodiment of the present invention;

图5为本发明实施例的指令模拟器指令执行流程图;FIG5 is a flowchart of an instruction execution of an instruction simulator according to an embodiment of the present invention;

图6为本发明实施例的差分调试对比整体架构图;FIG6 is a diagram showing an overall architecture of differential debugging comparison according to an embodiment of the present invention;

图7为本发明实施例的差分调试对比执行流程图;FIG7 is a flowchart of differential debugging comparison execution according to an embodiment of the present invention;

图8为本发明实施例的综合调试模块架构图。FIG8 is a diagram of the architecture of a comprehensive debugging module according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

需要说明,本发明中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本发明要求的保护范围之内。It should be noted that the descriptions of "first", "second", etc. in the present invention are only used for descriptive purposes and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but they must be based on the ability of ordinary technicians in the field to implement them. When the combination of technical solutions is contradictory or cannot be implemented, it should be deemed that such a combination of technical solutions does not exist and is not within the scope of protection required by the present invention.

如图1所示,本发明的实施例为:一种基于RISC-V的硬件开发系统,包括:指令追踪模块、波形切片模块、死锁检测模块、仿存追踪模块、函数调用追踪模块、外设访问追踪模块、指令模拟器、差分调试对比模块、综合调试模块及调用模块;As shown in FIG1 , an embodiment of the present invention is: a RISC-V-based hardware development system, comprising: an instruction tracing module, a waveform slicing module, a deadlock detection module, an emulated memory tracing module, a function call tracing module, a peripheral access tracing module, an instruction simulator, a differential debugging comparison module, a comprehensive debugging module and a calling module;

在处理器接入基于RISC-V的硬件开发系统后,将前进一个周期封装为single_cycle()函数,调用模块在single_cycle()函数中调用一次波形切片模块进行波形切片;After the processor is connected to the RISC-V-based hardware development system, the forward cycle is encapsulated as a single_cycle() function, and the calling module calls the waveform slicing module once in the single_cycle() function to perform waveform slicing;

在处理器完成复位并初始化信息之后,进入综合调试模块的主循环,综合调试模块发出执行命令调用cpu_exec()函数让处理器执行指令;After the processor completes the reset and initialization information, it enters the main loop of the integrated debugging module. The integrated debugging module issues an execution command to call the cpu_exec() function to let the processor execute instructions;

调用模块更新程序计数器,调用函数追踪模块判断当前程序计数器的地址是否为调用或返回函数;The calling module updates the program counter, and the calling function tracking module determines whether the current address of the program counter is a call or return function;

调用模块调用一次single_cycle()函数表示处理器前进一个周期,然后调用get_cpu_signals()函数来获取处理器中的相关信号;The calling module calls the single_cycle() function once to indicate that the processor advances one cycle, and then calls the get_cpu_signals() function to obtain the relevant signals in the processor;

为了判断处理器中一条指令是否已经执行完毕,需要在处理器的写回级添加一个交付模块,可以交付的指令特征如表1所示:In order to determine whether an instruction in the processor has been executed, a delivery module needs to be added to the write-back level of the processor. The characteristics of the instructions that can be delivered are shown in Table 1:

表1:Table 1:

为了防止同一条指令被交付多次,交付模块中会将上一个周期的指令寄存下来,只有当上一个周期的指令和当前周期的指令不同时才会进行交付。由于多周期流水线处理器无法确定每条指令执行的周期,因此在exec_once()函数中使用了一个循环,如果没有检测到交付使能信号拉高,则重复上述的执行行为,这样就可以兼容任意周期的流水线RISC-V处理器。根据这个原理,本发明的调用模块在single_cycle()函数后调用死锁检测模块来检查处理器是否进入死循环;In order to prevent the same instruction from being delivered multiple times, the delivery module will store the instructions of the previous cycle, and delivery will only be performed when the instructions of the previous cycle are different from the instructions of the current cycle. Since the multi-cycle pipeline processor cannot determine the execution cycle of each instruction, a loop is used in the exec_once() function. If the delivery enable signal is not detected to be pulled high, the above execution behavior is repeated, so that it can be compatible with pipeline RISC-V processors of any cycle. According to this principle, the calling module of the present invention calls the deadlock detection module after the single_cycle() function to check whether the processor enters a dead loop;

处理器在执行过程中如果有访问内存或访问外设的行为,调用模块通过dpic机制调用访存追踪模块和外设访问追踪模块通知开发平台,并将相关信息写入log文件中;If the processor accesses memory or peripherals during execution, the calling module calls the memory access tracking module and the peripheral access tracking module through the DPIC mechanism to notify the development platform and write the relevant information into the log file;

在一条指令执行完毕后,调用模块调用指令追踪模块将这一条执行过的指令信息记录下来;After an instruction is executed, the calling module calls the instruction tracking module to record the information of the executed instruction;

调用模块调用差分调试对比模块验证处理器执行是否正确;在差分调试对比模块中会通知指令模拟器执行一条指令,然后获取指令模拟器的寄存器堆状态与处理器的寄存器堆状态进行对比,如果状态相同则重新更新程序计数器继续执行下一条指令,如果状态不同则退出仿真并输出相关错误信息。The calling module calls the differential debugging comparison module to verify whether the processor execution is correct; in the differential debugging comparison module, the instruction simulator is notified to execute an instruction, and then the register stack state of the instruction simulator is obtained and compared with the register stack state of the processor. If the states are the same, the program counter is updated to continue executing the next instruction. If the states are different, the simulation is exited and relevant error information is output.

本方案中,波形切片模块每次记录的波形周期可以自行配置,大大减小生成的波形文件所占空间;差分调试对比模块可以自适应处理器每条指令的执行周期,即单周期或采用多级流水线技术的处理器都可以使用此模块。上述所有模块都进行了参数化,可以使用宏来自由选择是否开启。通过使用本发明的基于RISC-V的硬件开发系统,大大提高了处理器的开发调试效率。In this solution, the waveform cycle recorded by the waveform slicing module each time can be configured by itself, which greatly reduces the space occupied by the generated waveform file; the differential debugging comparison module can adapt to the execution cycle of each instruction of the processor, that is, single-cycle or multi-stage pipeline processors can use this module. All of the above modules are parameterized, and macros can be used to freely choose whether to turn them on. By using the RISC-V-based hardware development system of the present invention, the development and debugging efficiency of the processor is greatly improved.

以下对每个模块进行详细介绍:The following is a detailed introduction to each module:

指令追踪模块:为了记录处理器每次执行后的指令和程序计数器的地址信息,本发明在全局范围定义了一个logbuf字符串用于存放每次记录的信息。首先将程序计数器的地址和指令写入logbuf字符串中,然后调用llvm反汇编,输入指令信息,将反汇编后的信息输入logbuf中,这样就完成了一次执行指令的记录。Instruction tracing module: In order to record the address information of the instruction and program counter after each execution of the processor, the present invention defines a logbuf string in the global scope to store the information recorded each time. First, the address and instruction of the program counter are written into the logbuf string, and then llvm is called to disassemble, the instruction information is input, and the disassembled information is input into the logbuf, so that the record of an executed instruction is completed.

此外,本发明还定义了一个instruction buffer用于每次执行完毕(出错或正常执行结束)后输出最后的N条指令,方便寻找出错的指令位置。首先定义一个ringbuf结构体,其中包含了instbuf、head和count。instbuf是一个字符串数组,用于存放记录的指令流。head是一个头指针,用来指示下一条指令在instbuf中存放的位置,保证instbuf中存放的一直是最新的指令。count是一个计数器,用于指示instbuf是否被写满过。instbuf数组的大小可以配置,这里假设大小为10。首先定义inst_enqueue()函数用于将指令放入instbuf中,放入的位置为head的值。更新头指针时将head赋值为(head+1)%10,实现buffer循环写入的功能,并且将count自增1。在当前指令执行后调用inst_enqueue(),保证记录的指令都是执行正确的。其次,定义print_inst()函数用于从新到旧打印instbuf中的指令信息。如果instbuf已经被写满过,那么先输出head~10的数据,再输出0~head-1的数据;如果instbuf没有被写满过,那么直接输出0~head-1的数据。print_inst()只有在仿真退出时才会调用。In addition, the present invention also defines an instruction buffer for outputting the last N instructions after each execution (error or normal execution), so as to facilitate finding the position of the erroneous instruction. First, a ringbuf structure is defined, which includes instbuf, head and count. instbuf is a string array for storing the recorded instruction stream. head is a head pointer, which is used to indicate the location where the next instruction is stored in instbuf, ensuring that the latest instruction is always stored in instbuf. count is a counter, which is used to indicate whether instbuf has been filled. The size of the instbuf array can be configured, and it is assumed here that the size is 10. First, the inst_enqueue() function is defined to put the instruction into instbuf, and the position where it is put is the value of head. When updating the head pointer, the head is assigned to (head+1)%10 to realize the function of buffer circular writing, and count is incremented by 1. inst_enqueue() is called after the current instruction is executed to ensure that the recorded instructions are executed correctly. Secondly, the print_inst() function is defined to print the instruction information in instbuf from new to old. If instbuf has been filled, the data from head to 10 will be output first, followed by the data from 0 to head-1. If instbuf has not been filled, the data from 0 to head-1 will be output directly. print_inst() is only called when the simulation exits.

波形切片模块:verilator中是采用推进仿真时间单位的方法来驱动仿真,并且如果设置了生成波形,那么在仿真开始时需要打开波形文件,在仿真结束时需要关闭波形文件,利用这个特性就可以刷新波形文件。本发明在波形切片模块中定义了一个波形刷新计数器,当仿真推进一个周期时波形刷新计数器就自增1,当计数到N时执行刷新波形文件的操作,即关闭波形文件再打开波形文件。Waveform slicing module: Verilator uses the method of advancing simulation time units to drive simulation, and if waveform generation is set, the waveform file needs to be opened at the beginning of the simulation and closed at the end of the simulation. This feature can be used to refresh the waveform file. The present invention defines a waveform refresh counter in the waveform slicing module. When the simulation advances one cycle, the waveform refresh counter increments by 1. When the count reaches N, the operation of refreshing the waveform file is executed, that is, the waveform file is closed and then opened.

死锁检测模块:当处理器进入死锁状态时,指令无法进入下一级流水线导致最后一级的指令交付无法更新。本发明的死锁检测模块定义了一个死锁计数器,在每个周期后都会对交付模块中的指令进行判断,如果当前周期的指令与上一个周期的指令相同,那么死锁计数器就自增1,在每条指令交付后将死锁计数器清零。当死锁计数器计数到N(N一般设定为几百)时,即可判定处理器已经进入了死锁状态,并通知硬件开发系统退出仿真并打印相关调试信息。Deadlock detection module: When the processor enters a deadlock state, the instruction cannot enter the next level of the pipeline, resulting in the inability to update the instruction delivery of the last level. The deadlock detection module of the present invention defines a deadlock counter, which judges the instructions in the delivery module after each cycle. If the instruction of the current cycle is the same as the instruction of the previous cycle, the deadlock counter will increase by 1, and the deadlock counter will be cleared after each instruction is delivered. When the deadlock counter counts to N (N is generally set to several hundred), it can be determined that the processor has entered a deadlock state, and the hardware development system is notified to exit the simulation and print relevant debugging information.

访存追踪模块:处理器中的访存操作可以采用verilator的DPI-C机制实现,通过在verilog中调用C函数访问定义的memory数组(pmem)来实现类似访问SRAM的操作。本发明在调用DPI-C函数中定义了memory_trace()函数,接受当前进行访存的pc地址、访问的内存地址、访存的数据、访存的数据长度、访存的类型(读/写)作为参数,并将这些信息写入log file。Memory access tracking module: The memory access operation in the processor can be implemented using the DPI-C mechanism of Verilator, and the operation similar to accessing SRAM is implemented by calling the C function in Verilog to access the defined memory array (pmem). The present invention defines the memory_trace() function in calling the DPI-C function, which accepts the current PC address of memory access, the memory address accessed, the data accessed, the data length of the memory access, and the type of memory access (read/write) as parameters, and writes this information into the log file.

外设访问追踪模块:RISC-V手册中规定RISC-V的外设访问是由内存映射实现的,即访问外设时使用访存指令,而访存的地址不再是内存地址而是总线上外设的地址。与访存追踪模块中类似,外设访问追踪模块在访存调用DPI-C的函数中定义了device_trace()函数,这个函数接受当前进行访问外设的pc地址、访问的外设地址、访问的数据、访问的数据长度、访问的类型(读/写)作为参数,同时根据访问的外设地址判断访问的是哪个外设,并将这些信息写入logfile。Peripheral access tracking module: The RISC-V manual stipulates that RISC-V peripheral access is implemented by memory mapping, that is, memory access instructions are used when accessing peripherals, and the address of memory access is no longer a memory address but the address of the peripheral on the bus. Similar to the memory access tracking module, the peripheral access tracking module defines the device_trace() function in the function that calls DPI-C for memory access. This function accepts the pc address of the currently accessed peripheral, the address of the accessed peripheral, the accessed data, the length of the accessed data, and the type of access (read/write) as parameters, and determines which peripheral is being accessed based on the accessed peripheral address, and writes this information to the logfile.

函数调用追踪模块:根据RISC-V手册,函数调用和返回是执行jal与jalr这两条指令。因此本发明定义了trace_func_call()和trace_func_ret()函数在jar与jalr这两条指令执行时调用。trace_func_call()函数接受程序计数器和target作为参数,其中程序计数器是指令所在的地址,target是被调用函数的首地址,如以下函数调用中,pc=0x8000000c,target=0x80000260;0x8000000c:call[_trm_init@0x80000260]。有时编译器会将函数的返回优化成尾调用,即函数调用时用的是伪指令jr(展开为jalr x0,0(rs1)),返回地址没有记录在x1(return address for jumps),而是记录在了x0(hardwired to0,ignores writes)中,相当于直接把返回地址丢弃了。因此在trace_func_call()中还需要识别出地址的目的寄存器为x0的情况,并且记录当前程序计数器值(代表不会被显示返回的函数)。当有函数返回时,检查是否在上一层有“等待返回”的函数,如果有则调用trace_func_ret()一起返回。trace_func_ret()函数接受程序计数器作为参数,其中程序计数器是指令所在的地址,表示在该指令地址函数进行了返回。为了识别调用与返回函数的名称,本发明对elf文件的.symtab进行了解析。首先定义了一个结构体SymEntry用于存储符号表(symtab)中的一行。symbol_tbl是一个SymEntry类型的数组,每一个元素代表符号表中记录的一个符号,即.symtab中的一行。在read_symbopl_table()函数中解析出elf文件中symtbl的地址、信息、大小等数据并复制到symbol_tbl中。然后在find_symbol_func()函数中将函数地址转换为symbol_tbl对应的下标,其中函数调用必然跳转到函数首地址,而函数返回指令则可能落在[Value,Value+Size)区间内,因此使用is_call来区分调用与返回来做更精确的判断。Function call tracing module: According to the RISC-V manual, function calls and returns are executed by executing the two instructions jal and jalr. Therefore, the present invention defines the trace_func_call() and trace_func_ret() functions to be called when the two instructions jar and jalr are executed. The trace_func_call() function accepts the program counter and target as parameters, where the program counter is the address where the instruction is located, and the target is the first address of the called function, such as in the following function call, pc = 0x8000000c, target = 0x80000260; 0x8000000c: call[_trm_init@0x80000260]. Sometimes the compiler will optimize the return of the function into a tail call, that is, the pseudo instruction jr (expanded to jalr x0,0(rs1)) is used when calling the function, and the return address is not recorded in x1 (return address for jumps), but in x0 (hardwired to0, ignores writes), which is equivalent to directly discarding the return address. Therefore, in trace_func_call(), it is also necessary to identify the situation where the destination register of the address is x0, and record the current program counter value (representing the function that will not be displayed as returned). When a function returns, check whether there is a "waiting to return" function in the upper layer. If so, call trace_func_ret() to return together. The trace_func_ret() function accepts the program counter as a parameter, where the program counter is the address where the instruction is located, indicating that the function has returned at the instruction address. In order to identify the names of the calling and returning functions, the present invention parses the .symtab of the elf file. First, a structure SymEntry is defined for storing a row in the symbol table (symtab). symbol_tbl is an array of SymEntry type, and each element represents a symbol recorded in the symbol table, that is, a row in .symtab. In the read_symbopl_table() function, the address, information, size and other data of symtbl in the elf file are parsed and copied to symbol_tbl. Then, in the find_symbol_func() function, the function address is converted to the subscript corresponding to symbol_tbl. The function call must jump to the function's first address, while the function return instruction may fall within the range [Value, Value+Size). Therefore, is_call is used to distinguish between calls and returns for more accurate judgment.

指令模拟器:本发明的指令模拟器采用trace驱动,指令集架构为RV64IM,通过宏开关可以兼容RV32E、RV32I和RV32IM指令集。如图3所示,指令模拟器中主要有2个流程:初始化信息后进入指令执行的主循环。Instruction simulator: The instruction simulator of the present invention adopts trace driver, and the instruction set architecture is RV64IM. Through macro switch, it can be compatible with RV32E, RV32I and RV32IM instruction sets. As shown in Figure 3, there are mainly two processes in the instruction simulator: after initialization information, it enters the main loop of instruction execution.

初始化信息如图4所示,首先是初始化输入信息,这个函数会根据Makefile中给定的编译参数来判断是否开启批处理模式、指定log file文件、指定差分调试对比模块需要的镜像文件、指定函数调用追踪模块需要的elf文件等功能。初始化随机数生成器中使用stdlib.h库的srand()函数,通过gettimeofday()函数获取电脑实时的时间并经过转换后作为srand()的参数。初始化log file中将log输出默认指定为stdout(即terminal输出),然后使用fopen()的w模式来打开指定的log file文件,如果指定的log file文件无法打开则会Assert并报错。初始化memory中使用memset()函数将pmem(即定义的内存数组)赋值为随机数,随机数种子采用stdlib.h库的rand()函数。根据RISC-V手册规定,程序计数器地址会被复位为0x8000_0000。因此访问pmem的地址实际为pc-0x8000_0000。初始化外设中根据支持外设添加不同的外设端口。本发明的指令模拟器是RV64IM架构,根据RISC-V手册规定需要32个64位宽的通用寄存器,因此定义了一个结构体cpu,它的结构体成员有gpr、pc。其中cpu.gpr用于创建32个64位宽的通用寄存器,cpu.pc用于存储下一条指令的程序计数器地址。在初始化ISA中首先复制一段程序到pmem中,这是为了防止程序image不存在时指令模拟器运行报错。然后将当前指令的cpu.pc设置为复位值,即0x8000_0000,根据RISC-V手册规定,通用寄存器的0寄存器一直为0,因此还需将cpu.gpr[0]寄存器赋值为0。在初始化测试程序中,首先判断img_file指针是否为空指针,如果是空指针则打印错误信息并返回4096(这是默认image的大小)。然后使用fopen的rb模式打开镜像文件,如果无法打开则使用Assert报错。打开镜像文件后使用fseek()函数将文件指针移动到文件末尾,并使用ftell()函数获取文件指针当前位置,从而得到文件的大小。然后再次使用fseek()函数将文件指针移动到文件开头,并调用fread()函数从文件中读取数据到pmem中。最后返回测试程序镜像文件的大小。指令模拟器是作为处理器的reference,因此初始化差分调试对比模块直接用空函数即可。初始化综合调试模块中首先进行初始化正则表达式,调用regcomp()函数对当前规则的正则表达式和进行编译,然后检查regcomp()的返回值ret是否为0,如果不为0则表示编译失败。通过这个循环,所有的正则表达式规则都被编译并存储在相应的regex_t结构中。然后进行初始化监视点,监视点采用头插法的链表实现,因此除了最后一个节点指向NULL,其他节点都指向下一个节点。初始化指令Buffer中将ringbuf结构体的head和count赋值为0。The initialization information is shown in Figure 4. First, the input information is initialized. This function will determine whether to open the batch mode, specify the log file, specify the image file required by the differential debugging comparison module, and specify the elf file required by the function call tracking module according to the compilation parameters given in the Makefile. The srand() function of the stdlib.h library is used to initialize the random number generator. The real-time time of the computer is obtained through the gettimeofday() function and converted as the parameter of srand(). In the initialization of the log file, the log output is specified as stdout (i.e., terminal output) by default, and then the w mode of fopen() is used to open the specified log file. If the specified log file cannot be opened, Assert will be reported and an error will be reported. In the initialization of memory, the memset() function is used to assign pmem (i.e., the defined memory array) to a random number, and the random number seed uses the rand() function of the stdlib.h library. According to the RISC-V manual, the program counter address will be reset to 0x8000_0000. Therefore, the address of accessing pmem is actually pc-0x8000_0000. Different peripheral ports are added according to the supported peripherals in the initialization peripherals. The instruction simulator of the present invention is an RV64IM architecture. According to the RISC-V manual, 32 64-bit wide general registers are required, so a structure cpu is defined, and its structure members are gpr and pc. Wherein cpu.gpr is used to create 32 64-bit wide general registers, and cpu.pc is used to store the program counter address of the next instruction. In the initialization ISA, a program is first copied to pmem to prevent the instruction simulator from running and reporting an error when the program image does not exist. Then the cpu.pc of the current instruction is set to the reset value, that is, 0x8000_0000. According to the RISC-V manual, the 0 register of the general register is always 0, so the cpu.gpr[0] register needs to be assigned to 0. In the initialization test program, first determine whether the img_file pointer is a null pointer. If it is a null pointer, print an error message and return 4096 (this is the size of the default image). Then use fopen's rb mode to open the image file. If it cannot be opened, use Assert to report an error. After opening the image file, use the fseek() function to move the file pointer to the end of the file, and use the ftell() function to get the current position of the file pointer to get the size of the file. Then use the fseek() function again to move the file pointer to the beginning of the file, and call the fread() function to read data from the file into pmem. Finally, return the size of the test program image file. The instruction simulator is used as a reference for the processor, so the initialization of the differential debugging comparison module can be done directly with an empty function. In the initialization of the comprehensive debugging module, the regular expression is initialized first, and the regcomp() function is called to compile the regular expression of the current rule, and then check whether the return value ret of regcomp() is 0. If it is not 0, it means that the compilation failed. Through this cycle, all regular expression rules are compiled and stored in the corresponding regex_t structure. Then initialize the monitoring point. The monitoring point is implemented by a linked list with the head insertion method, so except for the last node pointing to NULL, other nodes point to the next node. In the initialization instruction Buffer, the head and count of the ringbuf structure are assigned to 0.

初始化完成后进入指令执行的主循环,指令执行的流程如图5所示。本发明在全局范围内定义了一个结构体Decode,成员有pc、snpc、dnpc、isa_inst_val、logbuf。其中Decode.isa_inst_val用于指示当前指令集下的指令,Decode.logbuf用于指令追踪模块中记录当前执行的指令信息。Decode.pc、Decode.snpc和Decode.dnpc主要用于指令执行。Decode.pc为当前指令的程序计数器地址;Decode.snpc为静态的nextpc,在RV64IM中即snpc=pc+4;Decode.dnpc为动态的nextpc,默认情况下dnpc=snpc,当遇到分支/跳转指令时,dnpc会被赋值为实际的跳转地址。同时在全局范围内还维护了一个结构体CPUState,成员有state、halt_pc、halt_ret。其中state用于判断当前模拟器处于什么状态,可以有如下赋值:RUNNING、STOP、END、ABORT、QUIT。halt_pc用于指示模拟器退出运行时的程序计数器地址,halt_ret用于指示模拟器退出运行时$a0寄存器的值,这个寄存器可以表示模拟器退出时程序运行有没有出错。After initialization is completed, the main loop of instruction execution is entered, and the flow of instruction execution is shown in Figure 5. The present invention defines a structure Decode in the global scope, and its members include pc, snpc, dnpc, isa_inst_val, and logbuf. Among them, Decode.isa_inst_val is used to indicate the instructions under the current instruction set, and Decode.logbuf is used to record the currently executed instruction information in the instruction tracking module. Decode.pc, Decode.snpc and Decode.dnpc are mainly used for instruction execution. Decode.pc is the program counter address of the current instruction; Decode.snpc is the static nextpc, that is, snpc=pc+4 in RV64IM; Decode.dnpc is the dynamic nextpc, and dnpc=snpc by default. When encountering a branch/jump instruction, dnpc will be assigned to the actual jump address. At the same time, a structure CPUState is also maintained in the global scope, and its members include state, halt_pc, and halt_ret. The state is used to determine the current state of the simulator, and can have the following values: RUNNING, STOP, END, ABORT, QUIT. halt_pc is used to indicate the address of the program counter when the simulator exits runtime, and halt_ret is used to indicate the value of the $a0 register when the simulator exits runtime. This register can indicate whether there is any error in the program running when the simulator exits.

指令执行的过程封装在cpu_exec()函数中,这个函数接受参数n,表示执行几条指令。进入cpu_exec()函数,首先判断n是否小于设定的MAX_INST_TO_PRINT,这样是为了防止在屏幕上打印过多的指令信息。然后判断CPUState的状态,如果是END或者ABORT,就直接返回,其他情况默认给state赋值为RUNNING。下面是execute()函数,这个函数也是接受参数n,表示执行几条指令。执行完毕后根据state的值打印不同的结果。如果是RUNNING或STOP,那么直接退出;如果是EEND或ABORT,那么打印停止运行时的指令程序计数器地址,最后输出程序运行的时间和指令数等信息。在execute()函数中有一个for循环,用来实现执行N次指令的操作。执行一条指令定义为exec_once()函数这个函数接受一个Decode类型的指针&s和一个全局范围的变量cpu.pc。在exec_once()函数中,首先将s->pc和s->snpc都赋值为pc,也就是当前指令的pc。然后定义了一个isa_exec_once()函数,这个函数接受Decode类型的指针&s作为参数,主要实现取指、译码和执行指令的功能。首先使用inst_fetch()取指函数从memory数组读取对应程序计数器地址的指令并赋值给isa_inst_val,同时更新snpc的值,即自增4。然后定义了decode_exec()函数,这个函数接受Decode类型的指针&s作为参数,在decodee_exec()中,首先根据RISC-V手册的规定将I、U、S、J、B类型的立即数译码出来,并且根据源寄存器rs1、rs2的地址从通用寄存器中索引src1、src2。接着在默认情况下将dnpc赋值为snpc。下面定义了一个INSTPAT宏,RISC-V的指令格式中可以根据opcode、func3和func7就可以确定指令,因此这个宏可以根据输入的不同的指令格式来判断出具体的指令并执行该指令对应的操作。如add指令,输入的参数为:INSTPAT("0000000 ????? ????? 000 ????? 01100 11",add,R,R(rd)=src1+src2);当识别为add指令时,表示这是R型指令,并且将rs1和rs2的值相加,结果赋值给rd寄存器。如果是跳转指令beq,那么输入的参数为:INSTPAT("??????? ????? ????? 000 ????? 11000 11",beq,B,s->dnpc=(src1==src2)?(s->pc+imm):s->snpc);beq指令是B型指令,执行过程是判断源寄存器rs1、rs2的值是否相等,如果相等就将下一个pc地址(即dnpc)赋值为pc+imm,否则下一个pc地址还是snpc。如果是ebreak指令,则会触发CPUTRAP()函数,这个函数会将state赋值为END并且根据$a0寄存器的值判断程序执行的结果是否正确。指令执行结束后返回到exec_once()函数,将cpu.pc赋值为dnpc,这就是最终的nextpc。执行一次指令后将记录指令的计数器自增1,然后调用trace_and_difftest()函数,在这个函数中调用差分调试和对比模块和指令追踪模块,根据情况将执行的指令痕迹打印在屏幕上。接下来判断state状态是否为RUNNING,如果不是则跳出循环,不执行后面的指令,并且根据state的状态判断当前模拟器执行程序是否正确。最后,如果当前模拟器支持外设,还需要刷新一次外设的状态。The instruction execution process is encapsulated in the cpu_exec() function. This function accepts the parameter n, which indicates how many instructions to execute. Entering the cpu_exec() function, first determine whether n is less than the set MAX_INST_TO_PRINT, in order to prevent printing too much instruction information on the screen. Then determine the state of CPUState. If it is END or ABORT, return directly. In other cases, the state is assigned to RUNNING by default. Below is the execute() function, which also accepts the parameter n, indicating how many instructions to execute. After execution, different results are printed according to the value of state. If it is RUNNING or STOP, then exit directly; if it is EEND or ABORT, then print the address of the instruction program counter when the execution stops, and finally output information such as the program running time and number of instructions. There is a for loop in the execute() function, which is used to implement the operation of executing N instructions. Executing an instruction is defined as the exec_once() function. This function accepts a pointer &s of type Decode and a global variable cpu.pc. In the exec_once() function, first assign s->pc and s->snpc to pc, which is the pc of the current instruction. Then an isa_exec_once() function is defined. This function accepts a pointer &s of type Decode as a parameter, and mainly implements the functions of fetching, decoding, and executing instructions. First, the inst_fetch() instruction fetch function is used to read the instruction corresponding to the program counter address from the memory array and assign it to isa_inst_val, and at the same time update the value of snpc, that is, increment by 4. Then the decodee_exec() function is defined. This function accepts a pointer &s of type Decode as a parameter. In decodee_exec(), first, according to the provisions of the RISC-V manual, the immediate values of types I, U, S, J, and B are decoded, and src1 and src2 are indexed from the general registers according to the addresses of the source registers rs1 and rs2. Then, by default, dnpc is assigned to snpc. An INSTPAT macro is defined below. In the RISC-V instruction format, the instruction can be determined based on opcode, func3, and func7. Therefore, this macro can determine the specific instruction based on the different instruction formats input and execute the corresponding operation of the instruction. For example, for the add instruction, the input parameters are: INSTPAT("0000000 ?????? ?????? 000 ?????? 01100 11",add,R,R(rd)=src1+src2); when it is recognized as an add instruction, it means that this is an R-type instruction, and the values of rs1 and rs2 are added, and the result is assigned to the rd register. If it is a jump instruction beq, the input parameter is: INSTPAT("???????????? ???????????? ????????? 000 ????????? 11000 11",beq,B,s->dnpc=(src1==src2)?(s->pc+imm):s->snpc); beq instruction is a B-type instruction. The execution process is to determine whether the values of source registers rs1 and rs2 are equal. If they are equal, the next pc address (i.e. dnpc) is assigned to pc+imm, otherwise the next pc address is still snpc. If it is an ebreak instruction, the CPUTRAP() function will be triggered. This function will assign state to END and determine whether the result of program execution is correct based on the value of the $a0 register. After the instruction is executed, it returns to the exec_once() function and assigns cpu.pc to dnpc, which is the final nextpc. After executing an instruction, the counter recording the instruction will be incremented by 1, and then the trace_and_difftest() function will be called. In this function, the differential debugging and comparison module and the instruction tracking module will be called, and the traces of the executed instructions will be printed on the screen according to the situation. Next, it is determined whether the state is RUNNING. If not, the loop will be jumped out, and the subsequent instructions will not be executed. And according to the state of the state, it is determined whether the current simulator executes the program correctly. Finally, if the current simulator supports peripherals, the status of the peripherals needs to be refreshed once.

差分调试对比模块:本发明的差分调试对比模块分为2个部分,分别是指令模拟器中的REF部分和处理器中的DUT部分,如图6所示。Differential debugging and comparison module: The differential debugging and comparison module of the present invention is divided into two parts, namely the REF part in the instruction simulator and the DUT part in the processor, as shown in FIG6 .

在REF部分,定义了difftest_memcpy()函数,这个函数的功能是从dut拷贝n字节到ref的memory,即将测试程序拷贝到指令模拟器的memory。diff_get_regs()函数的功能是获取ref的寄存器状态到dut,这个函数里定义了cpu类型的结构体diff_context,通过循环将diff_context->gpr[i]赋值为cpu.gpr[i]。diff_set_regs()函数的功能是狐设置ref的寄存器状态为dut,与diff_get_regs()函数类似,将cpu.gpr[i]赋值为diff_context->gpr[i]。difftest_regcpy difftest_regcpy()函数的作用是在ref和dut之间拷贝程序计数器和寄存器状态。当direction为`DIFFTEST_TO_DUT`时,获取ref的寄存器状态和pc到dut,即调用diff_set_regs()函数;当direction为`DIFFTEST_TO_REF`时,设置ref的寄存器状态和pc为dut,即调用diff_get_regs()函数。difftest_exec()函数的功能是让ref执行N条指令,因为本发明将指令模拟器作为差分调试对比的ref,因此ref执行N条指令就是调用cpu_exec(n)函数。最后是初始化ref的差分调试对比功能,调用初始化信息中的初始化memory和初始化ISA即可。由于memory空间过大,并且判断处理器行为是否正确本质上只要观察处理器核内的通用寄存器即可,因此本发明的差分调试对比模块只对通用寄存器作对比。完成后将指令模拟器编译为共享对象文件以便进行动态链接。In the REF section, the difftest_memcpy() function is defined. This function copies n bytes from dut to ref memory, that is, copies the test program to the memory of the instruction simulator. The diff_get_regs() function gets the register status of ref to dut. This function defines the cpu type structure diff_context, and assigns diff_context->gpr[i] to cpu.gpr[i] through a loop. The diff_set_regs() function sets the register status of ref to dut. Similar to the diff_get_regs() function, it assigns cpu.gpr[i] to diff_context->gpr[i]. difftest_regcpy The difftest_regcpy() function copies the program counter and register status between ref and dut. When direction is `DIFFTEST_TO_DUT`, get the register status of ref and pc to dut, that is, call diff_set_regs() function; when direction is `DIFFTEST_TO_REF`, set the register status of ref and pc to dut, that is, call diff_get_regs() function. The function of difftest_exec() function is to let ref execute N instructions. Because the present invention uses the instruction simulator as the ref for differential debugging comparison, ref executes N instructions by calling cpu_exec(n) function. Finally, initialize the differential debugging comparison function of ref, and call the initialization memory and initialization ISA in the initialization information. Because the memory space is too large, and judging whether the processor behavior is correct essentially only requires observing the general registers in the processor core, the differential debugging comparison module of the present invention only compares the general registers. After completion, compile the instruction simulator into a shared object file for dynamic linking.

在DUT部分,首先定义了init_difftest()函数,在这个函数中使用dlopen()函数打开传入的动态库文件ref_so_file,然后通过动态链接对动态库中的API符号(即REF部分定义的差分调试函数)进行符号解析和重定位,然后返回它们的地址。接下来对REF的差分调试对比功能进行初始化,即调用ref的difftest_init()函数;然后将DUT的memory数据、寄存器状态和cpu.pc拷贝到ref中,这样就完成了整个差分调试对比的初始化。In the DUT part, the init_difftest() function is first defined. In this function, the dlopen() function is used to open the incoming dynamic library file ref_so_file, and then the API symbols in the dynamic library (that is, the differential debugging functions defined in the REF part) are symbolically resolved and relocated through dynamic linking, and then their addresses are returned. Next, the differential debugging comparison function of REF is initialized, that is, the difftest_init() function of ref is called; then the memory data, register status and cpu.pc of the DUT are copied to ref, thus completing the initialization of the entire differential debugging comparison.

DUT部分的差分调试对比流程如图7所示,首先定义了difftest_step()函数在exec_once()之后调用。这个函数中定义了cpu类型的指针ref_r作为REF和DUT差分调试对比时的中间变量。difftest_step()函数中首先判断如果跳过检查这一条指令,那么就只设置ref的寄存器状态为dut。然后让ref执行一条指令并获取ref寄存器状态到dut,最后调用checkregs()函数进行ref和dut中通用寄存器的检查。checkregs()函数中首先对ref和dut的pc进行对比,如果pc不相等则说明ref和dut的状态不一致,直接退出仿真并输出当前的pc地址。如果pc地址相同,则通过for循环开始对比32个通用寄存器的值,如果相等则说明处理器指令执行正确,直接返回。The differential debugging comparison process of the DUT part is shown in Figure 7. First, the difftest_step() function is defined to be called after exec_once(). In this function, the cpu type pointer ref_r is defined as the intermediate variable for differential debugging comparison between REF and DUT. In the difftest_step() function, it is first determined that if the instruction is skipped, then only the register state of ref is set to dut. Then let ref execute an instruction and get the ref register state to dut, and finally call the checkregs() function to check the general registers in ref and dut. In the checkregs() function, the pc of ref and dut are first compared. If the pc is not equal, it means that the status of ref and dut is inconsistent, and the simulation is directly exited and the current pc address is output. If the pc address is the same, the values of 32 general registers are compared through the for loop. If they are equal, it means that the processor instruction is executed correctly and it returns directly.

综合调试模块:综合调试模块本质上是一个循环,通过调用readline库来实现命令行的效果,然后解析命令行输入参数来判断需要执行什么操作,如图8所示。Comprehensive debugging module: The comprehensive debugging module is essentially a loop that implements the effect of the command line by calling the readline library, and then parses the command line input parameters to determine what operations need to be performed, as shown in Figure 8.

其中单步执行就是调用cpu_exec(n)函数,n是命令行输入的数字。指令执行会直接调用cpu_exec(-1),由于cpu_exec()函数定义的输入参数是uint_64t类型,因此-1会直接转换为uint_64t的最大值。扫描内存即根据给定的内存地址去访问pmem,并且返回对应内存地址的数据。The single-step execution is to call the cpu_exec(n) function, where n is the number entered in the command line. The instruction execution will directly call cpu_exec(-1). Since the input parameter defined by the cpu_exec() function is of type uint_64t, -1 will be directly converted to the maximum value of uint_64t. Scanning memory means accessing pmem according to the given memory address and returning the data of the corresponding memory address.

计算表达式首先需要添加正则表达式规则,如算数的加减乘除,逻辑的与、或、相等、不等,左右括号、十进制和十六进制数字、32个通用寄存器和pc等。接下来需要进行表达式分裂,找到主符号(优先级最低的符号)。找到主符号后进行表达式的计算,如果表达式只有数字,那么直接返回对应的值;如果表达式为通用寄存器,那么返回对应通用寄存器的值;如果表达式为pc,那么返回cpu.pc的值;如果表达式有括号,那么去掉外层括号后再进行递归调用;最后进行主运算符的计算,返回计算出来的值。To calculate an expression, you first need to add regular expression rules, such as arithmetic addition, subtraction, multiplication and division, logical AND, OR, equality, inequality, left and right brackets, decimal and hexadecimal numbers, 32 general registers and PC, etc. Next, you need to split the expression and find the main symbol (the symbol with the lowest priority). After finding the main symbol, calculate the expression. If the expression only has numbers, then directly return the corresponding value; if the expression is a general register, then return the value of the corresponding general register; if the expression is PC, then return the value of cpu.pc; if the expression has brackets, then remove the outer brackets and then make a recursive call; finally, calculate the main operator and return the calculated value.

设置监视点采用头插法的链表结构实现,同时需要计算表达式进行配合。假设在程序开始时,即pc=0x8000_0000处输入wpc==0x8000_000c,那么表达式计算为false,并且将表达式写入链表的第一个节点中。接下来执行指令,当执行到pc=0x8000_000c时,监视点检测到节点的值从false变为true,就会将state赋值为STOP,停止指令执行。通过这种方式可以实现类似断点的效果,同时,如果设置的监视点是内存中的值,就可以监控内存数据的变化,大大提高了调试的效率。Setting monitoring points is implemented using a linked list structure with the head insertion method, and a calculation expression is required to cooperate. Assume that at the beginning of the program, when wpc == 0x8000_0000 is entered at pc = 0x8000_0000, the expression is calculated to be false, and the expression is written to the first node of the linked list. Next, the instruction is executed. When pc = 0x8000_000c is executed, the monitoring point detects that the value of the node changes from false to true, and the state is assigned to STOP to stop the execution of the instruction. In this way, a similar effect to a breakpoint can be achieved. At the same time, if the monitoring point set is a value in memory, the changes in memory data can be monitored, which greatly improves the efficiency of debugging.

退出仿真可以直接将state赋值为QUIT实现,同时为了方便调试,退出时会同时打印当前的指令和pc地址。To exit the simulation, you can directly assign the state value to QUIT. At the same time, in order to facilitate debugging, the current instruction and pc address will be printed at the same time when exiting.

打印寄存器使用for循环将cpu.gpr中的通用寄存器的值依次打印出来即可。Print registers Use a for loop to print out the values of the general registers in cpu.gpr one by one.

以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above descriptions are merely embodiments of the present invention and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made using the contents of the present invention specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present invention.

Claims (10)

1. A RISC-V based hardware development system comprising: the system comprises an instruction tracking module, a waveform slicing module, a deadlock detection module, a copy tracking module, a function call tracking module, a peripheral access tracking module, an instruction simulator, a differential debugging comparison module, a comprehensive debugging module and a call module;
After a processor is accessed to a hardware development system based on RISC-V, a forward cycle is packaged into a single_cycle () function, and a calling module calls a waveform slicing module once in the single_cycle () function to carry out waveform slicing;
after the processor finishes resetting and initializing information, entering a main loop of the comprehensive debugging module, and sending an execution command by the comprehensive debugging module to call a cpu_exec () function to enable the processor to execute instructions;
The calling module updates the program counter, and the calling function tracking module judges whether the address of the current program counter is a calling or returning function;
The calling module calls a single_cycle () function once to indicate that the processor advances for one cycle, and then calls a get_cpu_signals () function to acquire related signals in the processor;
the calling module calls the deadlock detection module after the single_cycle () function to check whether the processor enters the dead cycle;
If the processor has the action of accessing the memory or accessing the peripheral in the execution process, the calling module calls the memory tracking module and the peripheral access tracking module through a dpic mechanism to inform the development platform, and relevant information is written into the log file;
After one instruction is executed, the calling module calls the instruction tracking module to record the executed instruction information;
The calling module calls the differential debugging comparison module to verify whether the processor is executed correctly or not; and informing the instruction simulator to execute an instruction in the differential debugging comparison module, acquiring the register file state of the instruction simulator to compare with the register file state of the processor, updating the program counter again to continue executing the next instruction if the states are the same, and exiting simulation and outputting related error information if the states are different.
2. The RISC-V based hardware development system of claim 1 wherein: the write-back stage of the processor is provided with a delivery module which is specifically used for delivering only when the instruction of the last cycle is different from the instruction of the current cycle.
3. The RISC-V based hardware development system of claim 1 wherein: the instruction tracking module is specifically configured to write the address of the program counter and the instruction into a logbuf character string; and invoking llvm to disassemble, inputting instruction information, and inputting the disassembled information into logbuf to complete the recording of the one-time execution instruction.
4. The RISC-V based hardware development system of claim 1 wherein: the waveform slicing module is specifically configured to record the waveform of the last N periods, discard the waveform of the N periods if the processor executes correctly during the period, and restart recording the waveform of the next N periods.
5. The RISC-V based hardware development system of claim 1 wherein: the deadlock detection module is specifically configured to record an operation cycle of a current delivery instruction, and if the current instruction is executed over a preset cycle, determine that the processor enters a deadlock, and the hardware development system exits simulation and outputs related debug information.
6. The RISC-V based hardware development system of claim 1 wherein: the access tracking module is specifically configured to output related information to the log file according to the type of the access instruction, the access address and the access data, and record the pc address during the access.
7. The RISC-V based hardware development system of claim 1 wherein: the function call tracking module is specifically configured to read a corresponding program if file according to a current instruction, disassemble and output current function call information to a log file.
8. The RISC-V based hardware development system of claim 1 wherein: the peripheral access tracking module is specifically configured to determine whether the peripheral access belongs to the peripheral access according to the address of the current access memory, and output the address of the peripheral access, the name of the peripheral, and the access data information.
9. The RISC-V based hardware development system of claim 1 wherein: the differential debugging comparison module is specifically used for judging whether the current instruction is executed correctly or not according to the states of registers in the instruction execution post-processor and the instruction simulator, and returning the judging result to the hardware development system.
10. The RISC-V based hardware development system of claim 1 wherein: the comprehensive debugging module is specifically used for single step debugging of the processor, printing registers, setting monitoring points, calculating expressions, scanning memory and exiting simulation.
CN202410770378.3A 2024-06-14 2024-06-14 A hardware development system based on RISC-V Pending CN118605909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410770378.3A CN118605909A (en) 2024-06-14 2024-06-14 A hardware development system based on RISC-V

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410770378.3A CN118605909A (en) 2024-06-14 2024-06-14 A hardware development system based on RISC-V

Publications (1)

Publication Number Publication Date
CN118605909A true CN118605909A (en) 2024-09-06

Family

ID=92564402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410770378.3A Pending CN118605909A (en) 2024-06-14 2024-06-14 A hardware development system based on RISC-V

Country Status (1)

Country Link
CN (1) CN118605909A (en)

Similar Documents

Publication Publication Date Title
US10621068B2 (en) Software code debugger for quick detection of error root causes
US7788535B2 (en) Means and method for debugging
US8423965B2 (en) Tracing of data flow
US7950001B2 (en) Method and apparatus for instrumentation in a multiprocessing environment
JP5419103B2 (en) System and method for monitoring debug events
US12093398B2 (en) Vulnerability analysis and reporting for embedded systems
KR20210002701A (en) Execution control through cross-level trace mapping
US9626170B2 (en) Method and computer program product for disassembling a mixed machine code
CN103430158B (en) Use Execution Single Step to Diagnose Coding
US20150033211A1 (en) Program debugger and program debugging
KR20110070468A (en) Instrumentation execution device and method
CN104978284A (en) Processor subroutine cache
US7607047B2 (en) Method and system of identifying overlays
US20070005323A1 (en) System and method of automating the addition of programmable breakpoint hardware to design models
US7793160B1 (en) Systems and methods for tracing errors
CN110431536B (en) Implementing breakpoints across an entire data structure
US20030177471A1 (en) System and method for graphically developing a program
US20070005322A1 (en) System and method for complex programmable breakpoints using a switching network
CN111008133B (en) Coarse-grained data flow architecture execution array debugging method and device
CN117933167A (en) Simulation method and simulator for ultra-long instruction word heterogeneous processor
CN118605909A (en) A hardware development system based on RISC-V
CN114780409A (en) Breakpoint setting method based on program running process, electronic device and storage medium
Dalinger Formal verification of a processor with memory management units
CN118860762B (en) Simulation system and method based on digital signal processor
US11836426B1 (en) Early detection of sequential access violations for high level synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination