[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2016127600A1 - 异常处理方法及装置 - Google Patents

异常处理方法及装置 Download PDF

Info

Publication number
WO2016127600A1
WO2016127600A1 PCT/CN2015/086164 CN2015086164W WO2016127600A1 WO 2016127600 A1 WO2016127600 A1 WO 2016127600A1 CN 2015086164 W CN2015086164 W CN 2015086164W WO 2016127600 A1 WO2016127600 A1 WO 2016127600A1
Authority
WO
WIPO (PCT)
Prior art keywords
exception
instruction
address
pci
memory space
Prior art date
Application number
PCT/CN2015/086164
Other languages
English (en)
French (fr)
Inventor
蒋习旺
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016127600A1 publication Critical patent/WO2016127600A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance

Definitions

  • the present invention relates to the field of communications, and in particular to an exception handling method and apparatus.
  • PCI-E Peripheral Component Interconnect Express
  • 3GIO Third-generation PCI bus
  • PCI-E is a high-speed serial point-to-point dual-channel high-bandwidth transmission.
  • the connected devices allocate exclusive channel bandwidth and do not share bus bandwidth. They mainly support active power management, error reporting, end-to-end reliability transmission, hot swap and service.
  • QoS Quality of Service
  • PCI-E memory, I/O, configuration, and message space.
  • PCI-E memory, I/O, configuration, and message space.
  • the Central Processing Unit issues a read command to access the PCI-E memory space.
  • the processor first maps the local address of the instruction to the PCI-E controller through local access windows. Then uses outbound ATMU windows to map the processor address to the address of the PCI-E domain.
  • the transaction layer of the PCI-E controller will be based on The access type of the processor and the mapped PCI-E address constitute one or more TLPs. Finally, these TLPs are sent to the opposite end of the bus through the link layer and physical layer of the PCI-E, and wait for the completion message of the opposite end. End this thing.
  • the PCI-E controller When the processor sends a read access and the address of the read access does not correspond to the bank, the PCI-E controller will generate an exception due to waiting for the completion message to time out.
  • the handler will determine whether the instruction that triggered the exception is in the home state or in the kernel state. In the user mode, exception handling will send a SIGBUS signal to the user mode process. After the process receives the SIGBUS signal, the default processing method is to end the process. If it is in kernel mode, the user will print out the current processor environment information and exit the exception.
  • the PCI-E memory access instruction was not successfully executed, and the kernel will re-execute the instruction after the exception is returned. The memory access instruction will access the PCI-E address corresponding to the bank, so the exception will be thrown again, which will eventually cause the kernel to enter an infinite loop state.
  • PCI-E devices may be powered down or unplugged at any time. It is assumed that the processor continuously reads and accesses the memory space of the PCI-E device at a certain time. The PCI-E device suddenly loses power during the visit. If the access is initiated by a user-mode process, the process will be killed. If the access is initiated by the kernel mode, the system will fall into an infinite loop because of this access failure, so that the whole machine is down.
  • an embodiment of the present invention provides an abnormal processing method and apparatus.
  • an exception processing method including: detecting whether an abnormality occurs in a process of accessing a fast peripheral interconnection standard PCI-E storage memory space; if so, an abnormal instruction corresponding to an abnormality is generated The return value is set to an illegal value; the next address of the exception instruction is used as the return address of the current exception.
  • detecting whether an abnormality occurs during the process of accessing the PCI-E storage memory space includes: detecting whether the following occurs: when accessing the PCI-E storage memory space, the PCI-E storage memory space is not An abnormality caused by the corresponding storage entity, and whether the operation address of the abnormal instruction to be accessed is located in an address range of the PCI-E storage memory space; wherein, if the detection result is yes, determining to access the An exception occurred during PCI-E memory memory space.
  • the operation address is obtained by one of the following manners: when determining that the current abnormality is an exception caused by loading a load class instruction, acquiring the operation address from the load class instruction; or checking from a machine
  • the operation address is read in the interrupt status register MCSR.
  • the obtaining the operation address from the load class instruction includes: acquiring, according to a format type of the load class instruction, a specified location from the load class instruction corresponding to the format type. Operation address.
  • setting the return value corresponding to the abnormal instruction that causes the current abnormality to an illegal value includes: writing the illegal value in a data register corresponding to the abnormal instruction.
  • the abnormality includes at least one of the following: an exception caused by loading a load class instruction, and a data bus abnormality read on the bus.
  • an exception processing apparatus comprising: a detecting module configured to detect whether an abnormality occurs during access to a standard PCI-E memory space; and a setting module is set to When an abnormality occurs, the return value corresponding to the abnormal command causing the abnormality is set to an illegal value; and the determining module is set to use the next address of the abnormal command as the return address of the current abnormality.
  • the detecting module is configured to detect whether an abnormality occurs when the PCI-E storage memory space does not have a corresponding storage entity when accessing the PCI-E storage memory space. And whether the operation address of the abnormal instruction to be accessed is located in an address range of the PCI-E storage memory space; wherein, if the detection result is yes, determining that the access to the PCI-E storage memory space occurs abnormal.
  • the device further includes: an acquiring module, configured to acquire the operation address, wherein the acquiring module is further configured to: when determining that the current abnormality is an abnormality caused by loading a load class instruction, The load class instruction acquires the operation address; or reads the operation address from a machine check interrupt status register MCSR.
  • the acquiring module is further configured to acquire the operation address from a specified location corresponding to the format type in the load class instruction according to a format type of the load class instruction.
  • the next address of the abnormal instruction is used as the return address of the current abnormality, and the return value of the abnormal instruction is set as the illegal value.
  • FIG. 2 is a block diagram showing the structure of an exception handling apparatus according to an embodiment of the present invention.
  • FIG. 3 is a block diagram showing another structure of an exception processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a processing method for processing a read pcie memory exception using a method of analyzing an instruction according to a preferred embodiment of the present invention
  • FIG. 5 illustrates exception handling of pcie memory using registers provided by the powerpc platform architecture in accordance with a preferred embodiment of the present invention.
  • FIG. 1 is a flowchart of an exception processing method according to an embodiment of the present invention. As shown in FIG. 1, the flow includes the following steps:
  • Step S102 detecting whether an abnormality occurs during the process of accessing the standard peripheral PCI-E memory memory space.
  • Step S104 if yes, setting a return value corresponding to the abnormal instruction causing the abnormality to an illegal value
  • step S106 the next address of the abnormal command is used as the return address of the current abnormality.
  • the next address of the abnormal command is used as the return address of the current abnormality, and the return value of the abnormal command is set to an illegal value.
  • the problem caused by an abnormality in the access to the pcie memory address and the system hangs due to an infinite loop, thereby improving the robustness and survivability of the system.
  • the illegal value may be preset, and is preferably a hexadecimal F in the embodiment of the present invention.
  • Whether an abnormality occurs during the process of detecting the access to the PCI-E memory memory space may be detected by detecting whether an abnormality occurs due to the absence of a corresponding storage entity in the PCI-E storage memory space when accessing the PCI-E storage memory space. And whether the operation address to be accessed of the abnormal instruction is located in an address range of the PCI-E storage memory space; wherein, if the detection result is yes, determining that an abnormality occurs during access to the PCI-E storage memory space.
  • the operation address may be obtained by one of the following methods: when determining that the current abnormality is an exception caused by loading a load class instruction, acquiring the operation address from the load class instruction; or reading the above from the machine check interrupt status register MCSR
  • the operation address is different in the format type of the load class instruction. Therefore, in the implementation of the present invention, the operation address is obtained from the specified location corresponding to the format type in the load class instruction according to the format type of the load class instruction.
  • the abnormality includes at least one of the following: an exception caused by loading a load class instruction, and a data bus abnormality read on the bus.
  • the technical solution of the embodiment of the present invention solves the abnormal problem caused by the powerpc architecture processor reading and accessing the pcie memory address corresponding to the memoryless body, thereby achieving the process of not killing the user state, and does not cause the system to cause
  • the method of hanging out of the dead loop improves the robustness and survivability of the system.
  • Step 1) Determine the cause of the abnormality
  • the cause of the exception is determined in the exception handling flow: that is, the processor accesses the exception caused by the pcie memory address corresponding to the bank. If this is not the case, the exception continues as originally. Otherwise, this exception will be further processed
  • Step 2) Determine whether the abnormal address is within the range of the pcie memory address.
  • Step 3 Fill the return value to full F
  • the return value of the load memory instruction is filled with all Fs. Change the address returned by the exception to the address under the instruction that caused the exception, and skip the subsequent printing of the kernel exception or send the SIGBUS signal. Give the user process an operation and return directly.
  • the above technical solution provided by the embodiment of the present invention can easily and effectively prevent the problem that the process caused by the abnormality of the read access PCIe is killed or the system hangs.
  • An exception processing apparatus is also provided in the embodiment to implement the above-mentioned embodiments and preferred embodiments.
  • the descriptions of the modules involved in the apparatus are described below.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • 2 is a block diagram showing the structure of an exception handling apparatus according to an embodiment of the present invention. As shown in Figure 2, the device comprises:
  • the detecting module 20 is configured to detect whether an abnormality occurs during the process of accessing the standard peripheral PCI-E memory memory space;
  • the setting module 22 is connected to the detecting module 20, and is configured to set a return value corresponding to the abnormal command that causes the abnormality to an illegal value when an abnormality occurs;
  • the determining module 24 is connected to the setting module 22 and is configured to set the next address of the abnormal command as the return address of the current abnormality.
  • the related art has solved the problem that the process stops due to an abnormality in accessing the pcie memory address and the system hangs due to an infinite loop, thereby improving the robustness and survivability of the system.
  • FIG. 3 is a block diagram of another structure of an exception processing apparatus according to an embodiment of the present invention, wherein the detecting module 20 is configured to detect whether a PCI-E memory memory space is accessed due to the PCI-E memory memory space when accessing the PCI-E memory memory space.
  • An exception caused by the absence of the corresponding storage unit, and whether the operation address of the abnormal instruction to be accessed is located in the address range of the PCI-E storage memory space; wherein, in the case of the detection result being YES, determining to access the PCI- An exception occurred during the storage of the memory space.
  • the device further includes: an obtaining module 26, configured to obtain the operation address, wherein the obtaining module 26 is further configured to: obtain the foregoing from the load class instruction when determining that the current abnormality is an abnormality caused by loading a load type instruction The operation address; or read the above operation address from the machine check interrupt status register MCSR.
  • an obtaining module 26 configured to obtain the operation address, wherein the obtaining module 26 is further configured to: obtain the foregoing from the load class instruction when determining that the current abnormality is an abnormality caused by loading a load type instruction The operation address; or read the above operation address from the machine check interrupt status register MCSR.
  • the obtaining module 26 is further configured to acquire the operation address from a specified location corresponding to the format type in the load class instruction according to a format type of the load class instruction.
  • Step S402 the exception entry, the powerpc architecture instruction belongs to the reduced instruction set, adopts a unified instruction encoding manner, the lengths of the instructions are equal, and the op-code in all the instructions is always in the same position. According to this instruction feature, the judgment of the cause of the abnormality can be implemented by analyzing the instruction op-code.
  • step S404 the instruction regs->nip causing the abnormality is extracted by retaining the lower structure struct pt_regs when entering the abnormality. Take the [0:5] bits of the instruction and determine if the instruction belongs to the load class instruction. If so, the processing in step S406 is performed, otherwise step S410.
  • the Load instruction is further analyzed.
  • the address to be accessed by the instruction can be extracted according to the specific Load instruction.
  • Each load instruction is stored in a different format, so the address is extracted according to different instructions.
  • step S406 the address range covered by the pcie is obtained by the linux kernel structure variable struct pci_controller hose_head. It is also checked whether the address extracted in step S404 belongs to this range. If so, the operation in step S410 is performed, otherwise step S410 is performed.
  • Step S408 the information struct pt_regs retained when the abnormality is acquired is filled in the data register regs->gpr[0] of the load instruction causing the abnormality to be all F.
  • step S410 the original exception processing flow is executed.
  • Another implementation is to obtain and analyze exception information based on the powerpc core cache.
  • the processing of the read access pcie memory exception implemented using the core register will be further described below with reference to FIG.
  • Step S504 in the powerpc architecture, when an exception occurs, the Machine Check Address Register (MCAR) register contains the address of the instruction that caused the exception.
  • MCAR Machine Check Address Register
  • step S506 by reading the MCAR register, the operation address to be accessed by the instruction causing the abnormality can be obtained, and it is determined whether the operation address is located in the address range covered by pcie. If yes, the operation in step S506 is performed, otherwise step S508 is performed.
  • Step S508 the information struct pt_regs retained when the abnormality is acquired is filled with the data register regs->gpr[0] of the load instruction causing the abnormality as the full F.
  • step S508 the original exception processing flow is executed.
  • the embodiment of the present invention achieves the following technical effects: the related art stops the process caused by an abnormality in accessing the pcie memory address and the system hangs due to an infinite loop, thereby improving the robustness of the system. Sex and survivability.
  • a storage medium is further provided, wherein the software includes the above-mentioned software, including but not limited to: an optical disk, a floppy disk, a hard disk, an erasable memory, and the like.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above technical solution provided by the present invention can be applied to an exception processing process, and when an abnormality occurs, the next address of the abnormal instruction is used as the return address of the current abnormality, and the return value of the abnormal instruction is set to an illegal value.
  • the technical means solves the problem that the process stops due to an abnormality in accessing the pcie memory address and the system hangs due to an infinite loop, thereby improving the robustness and survivability of the system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种异常处理方法及装置,其中,所述方法包括:检测访问PCI-E存储memory空间过程中是否发生异常(S102);如果是,则将引起异常的异常指令所对应的返回值设置为非法值(S104);将所述异常指令的下一个地址作为所述当前异常的返回地址(S106)。采用上述技术方案,解决了相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,进而提升了系统的健壮性和残存性。

Description

异常处理方法及装置 技术领域
本发明涉及通信领域,具体而言,涉及一种异常处理方法及装置。
背景技术
快捷外设互联标准(Peripheral Component Interconnect Express,简称为PCI-E)是一种新型的总线标准和接口,是由英特尔在2001年公布的被称为“3GIO”的第三代PCI总线。PCI-E属于高速串行点对点双通道高带宽传输,所连接的设备分配独享通道带宽,不共享总线带宽,主要支持主动电源管理,错误报告,端对端的可靠性传输,热插拔以及服务质量(Quality of Service,简称为QoS)等功能。
PCI-E中规定了四种地址空间,它们分别是memory、I/O、configuration和message空间。在PCI-E体系架构中,访问没有存储体对应的memory地址时会产生异常。对于不同处理器平台,在对此类异常的处理方法上略会有些差异。
在精简指令集架构的中央处理器(PerPerformance Optimization With Enhanced-Performance Computing,简称POWERPC)处理器平台下,中央处理器(Central Processing Unit,简称为CPU发出一条读访问PCI-E memory空间的指令后,处理器首先通过local access windows将指令中的local address地址映射到PCI-E controller上。然后使用outbound ATMU windows将处理器地址映射为PCI-E域的地址。PCI-E控制器的事物层将根据处理器的访问类型和映射后的PCI-E地址构成一个或多个TLP。最终,这些TLP会通过PCI-E的链路层、物理层送达总线的对端,并等待对端的completion报文结束本次事物。
当处理器发送一个读访问,而读访问的地址没有存储体与之对应的情况下,PCI-E控制器会因等待completion报文超时而产生一个异常。在异常处理中过程中,处理函数会判断触发该异常的指令是户态,还是内核态的。如果是用户态,异常处理将发送SIGBUS信号给用户态的进程。用进程在收到SIGBUS信号后,默认的处理方式是结束该进程。如果是内核态,用户将打印出当前的处理器环境信息,并退出异常。PCI-E访存指令没有成功执行,在异常返回后内核将会重新执行该条指令。该访存指令会访问没有存储体对应的PCI-E地址,因此也会再次引发异常,最终将会导致内核进入死循环状态。
在支持PCI-E热插拔的系统中,PCI-E设备可能随时会掉电或拔出。假设在某一个时刻处理器连续读访问PCI-E设备的memory空间。在访问期间PCI-E设备突然掉电。若此次访问是用户态进程发起的,进程将被杀死。若访问是内核态发起的,系统将会因为这次访问失败而陷入死循环中,以致整机down掉。
针对相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,尚未提出有效的解决方案。
发明内容
为了解决上述技术问题,本发明实施例提供了一种异常处理方法及装置。
根据本发明的一个实施例,提供了一种异常处理方法,包括:检测访问快捷外设互联标准PCI-E存储memory空间过程中是否发生异常;如果是,则将引起异常的异常指令所对应的返回值设置为非法值;将所述异常指令的下一个地址作为所述当前异常的返回地址。
在本发明实施例中,检测访问PCI-E存储memory空间过程中是否发生异常,包括:检测是否发生以下情况:访问所述PCI-E存储memory空间时,由于所述PCI-E存储memory空间没有对应的存储实体而导致的异常,以及所述异常指令的所要访问的操作地址是否位于所述PCI-E存储memory空间的地址范围内;其中,在检测结果为是的情况下,确定访问所述PCI-E存储memory空间过程中发生异常。
在本发明实施例中,通过以下方式之一获取所述操作地址:在判断所述当前异常为加载load类指令引起的异常时,从所述load类指令获取所述操作地址;或从机器检查中断状态寄存器MCSR中读取所述操作地址。
在本发明实施例中,从所述load类指令获取所述操作地址,包括:根据所述load类指令的格式类型,从所述load类指令中与所述格式类型对应的指定位置获取所述操作地址。
在本发明实施例中,将引起所述当前异常的异常指令所对应的返回值设置为非法值,包括:在与所述异常指令对应的数据寄存器中写入所述非法值。
在本发明实施例中,所述异常至少包括以下之一:加载load类指令引起的异常、总线上读取数据总线异常。
根据本发明的另一个实施例,还提供了一种异常处理装置,包括:检测模块,设置为检测访问快捷外设互联标准PCI-E存储memory空间过程中是否发生异常;设置模块,设置为在发生异常时,将引起异常的异常指令所对应的返回值设置为非法值;确定模块,设置为将所述异常指令的下一个地址作为所述当前异常的返回地址。
在本发明实施例中,所述检测模块,设置为检测是否发生以下情况:访问所述PCI-E存储memory空间时,由于所述PCI-E存储memory空间没有对应的存储实体而导致的异常,以及所述异常指令的所要访问的操作地址是否位于所述PCI-E存储memory空间的地址范围内;其中,在检测结果为是的情况下,确定访问所述PCI-E存储memory空间过程中发生异常。
在本发明实施例中,所述装置还包括,获取模块,设置为获取所述操作地址,其中,所述获取模块还设置为在判断所述当前异常为加载load类指令引起的异常时,从所述load类指令获取所述操作地址;或从机器检查中断状态寄存器MCSR中读取所述操作地址。
在本发明实施例中,所述获取模块,还设置为根据所述load类指令的格式类型,从所述load类指令中与所述格式类型对应的指定位置获取所述操作地址。
通过本发明实施例,采用在发生异常时,将异常指令的下一个地址作为所述当前异常的返回地址,并将异常指令的返回值设置为非法值的技术手段,解决了相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,进而提升了系统的健壮性和残存性。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的异常处理方法的流程图;
图2为根据本发明实施例的异常处理装置的结构框图;
图3为根据本发明实施例的异常处理装置的另一结构框图;
图4为根据本发明优选实施例的使用分析指令的方法处理读pcie memory异常的处理方法的流程图;
图5为根据本发明优选实施例的使用powerpc平台架构提供的寄存器实现pcie memory的异常处理。
具体实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
在本实施例中提供了一种异常处理方法,图1是根据本发明实施例的异常处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,检测访问快捷外设互联标准PCI-E存储memory空间过程中是否发生异常;
步骤S104,如果是,则将引起异常的异常指令所对应的返回值设置为非法值;
步骤S106,将上述异常指令的下一个地址作为上述当前异常的返回地址。
通过上述各个步骤,采用在发生异常时,将异常指令的下一个地址作为所述当前异常的返回地址,,并将异常指令的返回值设置为非法值的的技术手段,解决了相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,进而提升了系统的健壮性和残存性。
可选地,非法值可以是预先设定的,在本发明实施例中优选为十六进制的F。
对于检测访问PCI-E存储memory空间过程中是否发生异常可以通过检测是否发生以下情况:访问上述PCI-E存储memory空间时,由于上述PCI-E存储memory空间没有对应的存储实体而导致的异常,以及上述异常指令的所要访问的操作地址是否位于上述PCI-E存储memory空间的地址范围内;其中,在检测结果为是的情况下,确定访问上述PCI-E存储memory空间过程中发生异常。
其中,可以通过以下方式之一获取上述操作地址:在判断上述当前异常为加载load类指令引起的异常时,从上述load类指令获取上述操作地址;或从机器检查中断状态寄存器MCSR中读取上述操作地址,由于load类指令的格式类型存在多种情况,因此,本发明实施中,根据上述load类指令的格式类型,从上述load类指令中与上述格式类型对应的指定位置获取上述操作地址。
在本发明实施例中,上述异常至少包括以下之一:加载load类指令引起的异常、总线上读取数据总线异常。
综上所述,本发明实施例的技术方案,解决了powerpc架构处理器读访问无存储体对应的pcie memory地址而引起的异常问题,进而达到不杀死用户态进程,也不会导致系统因死循环挂掉的方法,从而提升了系统的健壮性和残存性。
以下结合一示例说明上述实施例中所提供的异常处理方法,但不用于限定本发明实施例:
步骤1)确定导致异常的原因
首先要在异常处理流程中判定导致异常的原因:即处理器访问了没有存储体对应的pcie memory地址导致的异常。如果不是这种情况,异常按原有的流程继续进行。否则,将针对这种异常做进一步处理
步骤2)判断异常地址是否在pcie memory地址范围内
在确定异常原因后,需要进一步获取引起异常指令的操作地址,并将这个地址记录下来。并获取系统中所有pcie控制器所管辖的memory地址范围。最后,判断所记录的异常指令操作地址是否属于pcie memory空间。如果是,说明这种情况满足本发明实施例要处理的条件。否则,按原有的异常处理流程执行。
步骤3)将返回值填充为全F
当以上两种情况和条件都满足后,将load memory指令的返回值填充为全F。并将异常返回的地址改为引发异常的指令下地址,并跳过后续记录内核异常的打印或发送SIGBUS信号 给用户进程的操作,直接返回。
采用本发明实施例所提供的上述技术方案,可以简单有效的防止读访问PCIe产生异常后导致的进程被杀死或是系统挂掉的问题。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必需的。
在本实施例中还提供了一种异常处理装置,用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述,下面对该装置中涉及到的模块进行说明。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。图2为根据本发明实施例的异常处理装置的结构框图。如图2所示,该装置包括:
检测模块20,设置为检测访问快捷外设互联标准PCI-E存储memory空间过程中是否发生异常;
设置模块22,与检测模块20连接,设置为在发生异常时,将引起异常的异常指令所对应的返回值设置为非法值;
确定模块24,与设置模块22连接,设置为将上述异常指令的下一个地址作为上述当前异常的返回地址。
通过上述各个模块的综合应用,采用在发生异常时,将异常指令的下一个地址作为所述当前异常的返回地址,并将将引起上述异常指令所对应的返回值设置为非法值的技术手段,解决了相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,进而提升了系统的健壮性和残存性。
图3为根据本发明实施例的异常处理装置的另一结构框图,其中,检测模块20,用于检测是否发生以下情况:访问上述PCI-E存储memory空间时,由于上述PCI-E存储memory空间没有对应的存储单元而导致的异常,以及上述异常指令的所要访问的操作地址是否位于上述PCI-E存储memory空间的地址范围内;其中,在检测结果为是的情况下,确定访问上述PCI-E存储memory空间过程中发生异常。
可选地,上述装置还包括,获取模块26,设置为获取上述操作地址,其中,获取模块26还设置为在判断上述当前异常为加载load类指令引起的异常时,从上述load类指令获取上述操作地址;或从机器检查中断状态寄存器MCSR中读取上述操作地址。
进一步地,获取模块26,还设置为根据上述load类指令的格式类型,从上述load类指令中与上述格式类型对应的指定位置获取上述操作地址。
为了更好的理解上述异常处理过程,以下结合优选实施例进行说明,但不用于限定本发 明实施例的保护范围。
下面结合图4进一步详细描述使用分析指令的方法解决读pcie memory异常的处理方法:
步骤S402,异常入口,powerpc架构的指令属于精简指令集,采用统一指令编码方式,指令的长度相等,而且所有指令中的op-code永远位于同样的位置。根据这一指令特征,对异常原因的判断可以采取分析指令op-code的方式来实现。
步骤S404,通过进入异常时保留下结构体struct pt_regs,提取引起异常的指令regs->nip。将指令的[0:5]bits取出,判断该指令是否属于load类指令。如果是,则执行步骤S406中的处理,否则步骤S410。
为了获取引起异常的指令的操作地址,要对Load指令做进行进一步分析。可根据具体的Load指令提取出指令要访问的地址。每种load指令放存的格式不同,因此要根据不同的指令来提取地址。
步骤S406,通过linux内核结构变量struct pci_controller hose_head获取pcie所覆盖的地址范围。并检验步骤S404中提取的地址是否属于这个范围。如果是,执行步骤S410中的操作,否则执行步骤S410。
步骤S408,获取异常时保留下来的信息struct pt_regs,将引起异常的load指令的数据寄存器regs->gpr[0]填充为全F。并将异常返回的地址指向引起异常指令的下地址regs->nip=(regs->nip)+4;。这样再重新返回到程序运行时,程序将不会感知到异常的发生过,就好像load指令在pcie memory空间内获取了值为全F的非法值一样。
步骤S410,执行原有的异常处理流程。
另一种实现方法是根据powerpc核心存器来获取并分析异常信息。下面结合图5对使用核心寄存器实现的读访问pcie memory异常的处理作进一步的描述。
步骤S502,在异常入口处,通过获取MCSR寄存器来判断异常的类型。如果自动校验综合寄存器Machine Check Syndrome Register,简称为MCSR)[60]=1,则表示产生了“Bus read data bus error”异常。这种类型异常;
步骤S504,在powerpc架构中,在发生异常时,自动校验地址寄存器(Machine Check Address Register,简称为MCAR)寄存器包含了引起异常的指令所操作的地址。
步骤S506,通过读取MCAR寄存器,可以获取引起异常的指令要访问的操作地址,判断上述操作地址是否位于pcie所覆盖的地址范围,如果是,执行步骤S506中的操作,否则执行步骤S508。
步骤S508,获取异常时保留下来的信息struct pt_regs,将引起异常的load指令的数据寄存器regs->gpr[0]填充为全F。并将异常返回的地址指向引起异常指令的下地址regs->nip=(regs->nip)+4;。这样再重新返回到程序运行时,程序将不会感知到异常的发生过,就好像load指令在pcie memory空间内获取了值为全F的非法值一样。
步骤S508,执行原有的异常处理流程。
综上所述,本发明实施例达到了以下技术效果:解决了相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,进而提升了系统的健壮性和残存性。
在另外一个实施例中,还提供了一种软件,该软件用于执行上述实施例及优选实施方式中描述的技术方案。
在另外一个实施例中,还提供了一种存储介质,该存储介质中存储有上述软件,该存储介质包括但不限于:光盘、软盘、硬盘、可擦写存储器等。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的对象在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
本发明提供的上述技术方案,可以应用于异常处理过程中,采用在发生异常时,将异常指令的下一个地址作为所述当前异常的返回地址,并将异常指令的返回值设置为非法值的技术手段,解决了相关技术中,访问pcie memory地址过程中出现异常而引起的进程停止以及系统因死循环挂掉的问题,进而提升了系统的健壮性和残存性。

Claims (10)

  1. 一种异常处理方法,包括:
    检测访问快捷外设互联标准PCI-E存储memory空间过程中是否发生异常;
    如果是,则将引起异常的异常指令所对应的返回值设置为非法值;
    将所述异常指令的下一个地址作为当前异常的返回地址。
  2. 根据权利要求1所述的方法,其中,检测访问PCI-E存储memory空间过程中是否发生异常,包括:
    检测是否发生以下情况:访问所述PCI-E存储memory空间时,由于所述PCI-E存储memory空间没有对应的存储实体而导致的异常,以及所述异常指令的所要访问的操作地址是否位于所述PCI-E存储memory空间的地址范围内;其中,在检测结果为是的情况下,确定访问所述PCI-E存储memory空间过程中发生异常。
  3. 根据权利要求2所述的方法,其中,通过以下方式之一获取所述操作地址:
    在判断所述当前异常为加载load类指令引起的异常时,从所述load类指令获取所述操作地址;或
    从机器检查中断状态寄存器MCSR中读取所述操作地址。
  4. 根据权利要求3所述的方法,其中,从所述load类指令获取所述操作地址,包括:
    根据所述load类指令的格式类型,从所述load类指令中与所述格式类型对应的指定位置获取所述操作地址。
  5. 根据权利要求1所述的方法,其中,将引起所述当前异常的异常指令所对应的返回值设置为非法值,包括:
    在与所述异常指令对应的数据寄存器中写入所述非法值。
  6. 根据权利要求1至5任一项所述的方法,其中,所述异常至少包括以下之一:
    加载load类指令引起的异常、总线上读取数据总线的异常。
  7. 一种异常处理装置,包括:
    检测模块,设置为检测访问快捷外设互联标准PCI-E存储memory空间过程中是否发生异常;
    设置模块,设置为在发生异常时,将引起异常的异常指令所对应的返回值设置为非法值;
    确定模块,设置为将所述异常指令的下一个地址作为当前异常的返回地址。
  8. 根据权利要求7所述的装置,其中,所述检测模块,设置为检测是否发生以下情况:访问 所述PCI-E存储memory空间时,由于所述PCI-E存储memory空间没有对应的存储实体而导致的异常,以及所述异常指令的所要访问的操作地址是否位于所述PCI-E存储memory空间的地址范围内;其中,在检测结果为是的情况下,确定访问所述PCI-E存储memory空间过程中发生异常。
  9. 根据权利要求8所述的装置,其中,所述装置还包括,获取模块,设置为获取所述操作地址,其中,所述获取模块还设置为在判断所述当前异常为加载load类指令引起的异常时,从所述load类指令获取所述操作地址;或从机器检查中断状态寄存器MCSR中读取所述操作地址。
  10. 根据权利要求9所述的装置,其中,所述获取模块,还设置为根据所述load类指令的格式类型,从所述load类指令中与所述格式类型对应的指定位置获取所述操作地址。
PCT/CN2015/086164 2015-02-12 2015-08-05 异常处理方法及装置 WO2016127600A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510076615.7A CN105988905A (zh) 2015-02-12 2015-02-12 异常处理方法及装置
CN201510076615.7 2015-02-12

Publications (1)

Publication Number Publication Date
WO2016127600A1 true WO2016127600A1 (zh) 2016-08-18

Family

ID=56614198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/086164 WO2016127600A1 (zh) 2015-02-12 2015-08-05 异常处理方法及装置

Country Status (2)

Country Link
CN (1) CN105988905A (zh)
WO (1) WO2016127600A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124726A (zh) * 2019-12-09 2020-05-08 上海移远通信技术股份有限公司 打开modem端口异常检测方法及装置
CN114647546A (zh) * 2022-03-30 2022-06-21 苏州浪潮智能科技有限公司 一种机箱异常的处理方法、装置、电子设备及存储介质
CN117573418A (zh) * 2024-01-15 2024-02-20 北京趋动智能科技有限公司 针对显存访问异常的处理方法、系统、介质及设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825593B (zh) * 2019-11-11 2022-08-23 腾讯科技(深圳)有限公司 一种进程的异常状态检测方法、装置、设备及存储介质
CN113467981A (zh) * 2020-03-31 2021-10-01 华为技术有限公司 异常处理的方法和装置
CN111682991B (zh) * 2020-05-28 2022-08-12 杭州迪普科技股份有限公司 总线错误消息的处理方法及装置
CN116680208B (zh) * 2022-12-16 2024-05-28 荣耀终端有限公司 异常识别方法以及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060041863A1 (en) * 2002-08-05 2006-02-23 Kazunori Saito Data processing method, date processing device computer program and recording medium
US20080148016A1 (en) * 2006-12-13 2008-06-19 Fujitsu Limited Multiprocessor system for continuing program execution upon detection of abnormality
CN101625656A (zh) * 2009-07-28 2010-01-13 杭州华三通信技术有限公司 一种处理pci系统异常的方法及装置
CN103309762A (zh) * 2013-06-21 2013-09-18 杭州华三通信技术有限公司 设备异常处理方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533370B (zh) * 2009-04-09 2011-10-26 成都市华为赛门铁克科技有限公司 一种内存异常访问定位方法及装置
US9430349B2 (en) * 2013-01-24 2016-08-30 Xcerra Corporation Scalable test platform in a PCI express environment with direct memory access

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060041863A1 (en) * 2002-08-05 2006-02-23 Kazunori Saito Data processing method, date processing device computer program and recording medium
US20080148016A1 (en) * 2006-12-13 2008-06-19 Fujitsu Limited Multiprocessor system for continuing program execution upon detection of abnormality
CN101625656A (zh) * 2009-07-28 2010-01-13 杭州华三通信技术有限公司 一种处理pci系统异常的方法及装置
CN103309762A (zh) * 2013-06-21 2013-09-18 杭州华三通信技术有限公司 设备异常处理方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124726A (zh) * 2019-12-09 2020-05-08 上海移远通信技术股份有限公司 打开modem端口异常检测方法及装置
CN111124726B (zh) * 2019-12-09 2024-01-26 上海移远通信技术股份有限公司 打开modem端口异常检测方法及装置
CN114647546A (zh) * 2022-03-30 2022-06-21 苏州浪潮智能科技有限公司 一种机箱异常的处理方法、装置、电子设备及存储介质
CN117573418A (zh) * 2024-01-15 2024-02-20 北京趋动智能科技有限公司 针对显存访问异常的处理方法、系统、介质及设备
CN117573418B (zh) * 2024-01-15 2024-04-23 北京趋动智能科技有限公司 针对显存访问异常的处理方法、系统、介质及设备

Also Published As

Publication number Publication date
CN105988905A (zh) 2016-10-05

Similar Documents

Publication Publication Date Title
WO2016127600A1 (zh) 异常处理方法及装置
JP6871957B2 (ja) エミュレートされたエンドポイントコンフィグレーション
US10853272B2 (en) Memory access protection apparatus and methods for memory mapped access between independently operable processors
CN105095128B (zh) 中断处理方法及中断控制器
US10678583B2 (en) Guest controlled virtual device packet filtering
US9912474B2 (en) Performing telemetry, data gathering, and failure isolation using non-volatile memory
TWI632462B (zh) 開關裝置及偵測積體電路匯流排之方法
US11960350B2 (en) System and method for error reporting and handling
US8286027B2 (en) Input/output device including a mechanism for accelerated error handling in multiple processor and multi-function systems
US10078543B2 (en) Correctable error filtering for input/output subsystem
US8813071B2 (en) Storage reclamation systems and methods
CN105373345B (zh) 存储器设备和模块
US9575855B2 (en) Storage apparatus and failure location identifying method
KR101498452B1 (ko) 복합 멀티-코어 및 멀티-소켓 시스템의 디버깅
US10157005B2 (en) Utilization of non-volatile random access memory for information storage in response to error conditions
US8402320B2 (en) Input/output device including a mechanism for error handling in multiple processor and multi-function systems
EP3035227A1 (en) Method and device for monitoring data integrity in shared memory environment
EP2951706A1 (en) Controlling error propagation due to fault in computing node of a distributed computing system
US9639076B2 (en) Switch device, information processing device, and control method of information processing device
US8880957B2 (en) Facilitating processing in a communications environment using stop signaling
JP5341198B2 (ja) 通信インタフェースにおけるビット反転
US8589722B2 (en) Methods and structure for storing errors for error recovery in a hardware controller
JP7404223B2 (ja) 不正なメモリダンプ改変を防ぐシステムおよび方法
CN114936135A (zh) 一种异常检测方法、装置及可读存储介质
CN108874579B (zh) 用于监管和初始化端口的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15881759

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15881759

Country of ref document: EP

Kind code of ref document: A1