[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2014206078A1 - Memory access method, device and system - Google Patents

Memory access method, device and system Download PDF

Info

Publication number
WO2014206078A1
WO2014206078A1 PCT/CN2014/071252 CN2014071252W WO2014206078A1 WO 2014206078 A1 WO2014206078 A1 WO 2014206078A1 CN 2014071252 W CN2014071252 W CN 2014071252W WO 2014206078 A1 WO2014206078 A1 WO 2014206078A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
memory
accessed
target process
operating system
Prior art date
Application number
PCT/CN2014/071252
Other languages
French (fr)
Chinese (zh)
Inventor
褚力行
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014206078A1 publication Critical patent/WO2014206078A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Definitions

  • TECHNICAL FIELD Embodiments of the present invention relate to computer technologies, and in particular, to a memory access method, apparatus, and system.
  • a server system can be composed of one or more servers, and each server acts as a node to form a non-uniform memory access (NUMA) architecture.
  • NUMA non-uniform memory access
  • each node may include one or more central processing units (CPUs), and each CPU may be configured with a certain memory resource in advance.
  • the process running on the CPU can use the following three methods to access the memory resources in the server system, and the other one is local (local) memory access, 4 near (buddy) memory access, and remote (remote) memory access.
  • the method of accessing the CPU's own memory resources by the process running on the CPU that is, the local memory access
  • the method running on the CPU accesses the memory resources of other CPUs in the node to which the CPU belongs, that is, the adjacent memory access
  • the running process accesses the memory resources of the CPUs in other nodes other than the node to which the CPU belongs, that is, the remote memory access.
  • Embodiments of the present invention provide a memory access method, apparatus, and system for improving overall performance of a server system.
  • a first aspect of the embodiments of the present invention provides a memory access method, including: receiving, by a node controller, monitoring information sent by an operating system, where the monitoring information carries a monitored memory in a first node to which the node controller belongs
  • the monitored memory is a memory resource occupied by the target process on the first node, and the operating system runs in a server system consisting of at least two nodes including the first node.
  • the target process is a process running on a central processing unit CPU of the first node and accessing a memory of the accessed node other than the first node in the server system;
  • the node controller monitors that the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to a threshold, sending the information of the accessed node to the An operating system, such that the operating system migrates the target process to the accessed node according to information of the accessed node.
  • the threshold is a memory that is running on all CPUs in the first node in a preset time, and is in memory of other nodes. The ratio of the number of accesses to the total number of CPUs in the first node.
  • the node controller is a node controller NC chip
  • the receiving, by the node controller, the monitoring information sent by the operating system includes:
  • the node controller NC chip receives the monitoring information sent by the operating system by using a motherboard management control unit of the first node;
  • the information of the accessed node is Sending to the operating system includes:
  • the node controller NC chip if the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to the threshold, the node being accessed
  • the information of the accessed memory is sent to the operating system by the mainboard management control unit.
  • a second aspect of the embodiments of the present invention provides a node controller, including: a receiving unit, configured to receive monitoring information sent by an operating system, where the monitoring information carries information about monitored memory in a first node to which the node controller belongs, where the monitored memory is a target process at the first node An occupied memory resource, the operating system running on each node in a server system consisting of at least two nodes including the first node, where the target process is in the middle of the first node a process running on the processor CPU and accessing memory of the accessed node other than the first node in the server system;
  • a monitoring unit configured to send, when the frequency of accessing the memory of the accessed node by the target process that monitors the monitored memory is greater than or equal to a threshold, sending information of the accessed node to the An operating system, to cause the operating system to migrate the target process to the accessed node according to information of the accessed node.
  • the threshold is a memory that is running on all CPUs in the first node in a preset time, and is in memory of other nodes. The ratio of the number of accesses to the total number of CPUs in the first node.
  • the receiving unit is specifically configured to:
  • the monitoring unit is specifically configured to:
  • the main board management control unit sends to the operating system.
  • a third aspect of the embodiments of the present invention provides a server system, including at least two nodes including a node controller; and an operating system running in the server system;
  • the monitoring information is sent to the first node where the target process is running, where the monitoring information carries the monitored memory in the first node to which the node controller belongs.
  • the monitored memory is a memory resource occupied by the target process on the first node, and the target process is on a central processing unit CPU of the first node. a process running, and accessing memory of the accessed node other than the first node in the server system;
  • the operating system migrates the target process to the accessed node.
  • the operating system determines that the CPU of the accessed node belongs to the CPU to which the accessed memory belongs, the memory resource has the target process running Capabilities, the target process is migrated to the CPU to which the accessed memory belongs; if it is determined that the CPU to which the accessed memory belongs, the memory resource does not have the ability to run the target process, and the accessed node
  • the memory resources of other CPUs in the CPU have the ability to run the target process, and then migrate the target process to other CPUs of the accessed node.
  • FIG. 1 is a flowchart of a memory access method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another memory access method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a node controller according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another node controller according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a server system according to an embodiment of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • a memory access method for migrating a process that satisfies certain conditions is used. The way memory access is converted to local memory access or proximity memory access.
  • Embodiments of the present invention can be applied to a multi-server system composed of at least two nodes, each of which includes a node controller according to various embodiments of the present invention.
  • Each node can be a server, and each server can include one or more CPUs, and each CPU can be allocated a part of memory resources correspondingly.
  • the operating system (Operational Edition) can be run on the server system.
  • FIG. 1 is a flowchart of a memory access method according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
  • the node controller receives the monitoring information sent by the operating system.
  • the monitoring information carries information about the monitored memory in the first node to which the node controller belongs, where the monitored memory is a memory resource occupied by the target process on the first node, and the operating system runs.
  • the target process is run on a central processor CPU of the first node, and the server is The process of accessing the memory of the accessed node other than the first node in the system.
  • the operating system runs on each node of the server system, and each node may run one or more processes on the CPU, and the processes may use local memory access, adjacent memory access, or remote memory access for memory access. .
  • the embodiments of the present invention are mainly directed to the process of performing remote memory access. Therefore, the "target process” described in the embodiments of the present invention refers to the process of performing remote memory access. That is, the target process accesses memory resources in other nodes other than the node to which the CPU it is running belongs to.
  • the "first node” described in the embodiments of the present invention is a node to which the node controller of the execution subject belongs, so as to be distinguished from other nodes to avoid confusion.
  • the operating system can manage each node and the processes running in each node from a global perspective, so the operating system can know the running status of the processes in each node. For example, which processes are running on the node, which CPU the processes are running on, and which part of the memory is occupied by which CPU.
  • the operating system can know the status of each process accessing the memory, including which part of the memory of which node the process accesses. Therefore, when the process performs remote memory access The operating system is aware of this situation.
  • the operating system sends monitoring information to the node controller on the first node running the target process when it knows that there is a target process in the process running by the server system.
  • the monitoring information includes information about the memory resources occupied by the target process on the first node.
  • the part of the memory resources is referred to as “monitored memory”.
  • Step 102 is performed to enable the operating system to migrate the target process to the accessed node according to the information of the accessed node.
  • the node controller monitors the monitored memory indicated in the monitoring information.
  • the first node is able to know the situation in which the process running on it accesses the memory.
  • the node controller may not directly monitor the process, because the process running on the first node will locally occupy part of the memory on the running CPU, so the node controller can monitor this part of the memory.
  • the node controller can also learn the information of the accessed memory on the other node, and the memory accessed by the target process in the embodiments of the present invention.
  • the node to which it belongs is called the "accessed node”; the memory resource that is located on other nodes and accessed by the target process is called "accessed memory”.
  • the node controller monitors the progress of the processes running in the monitored memory and the memory of other nodes. Specifically, it can monitor the frequency at which the target process accesses the memory of other nodes.
  • the statistical period and threshold can be preset in the node controller.
  • the node controller monitors the number of times the target node accesses the memory of the accessed node during each statistical period, and obtains the frequency value. If the node controller determines that the frequency value is greater than or equal to the threshold, it indicates that the target process currently uses the remote memory access mode. In order to improve the performance of the server system, in this case, the node controller sends the information of the accessed node to the operating system.
  • the information of the accessed node may include information of the accessed memory accessed by the target process in the accessed node. However, due to operation The system can know which memory resources are accessed when the target process performs remote access. Therefore, it is not necessary to carry the information of the accessed memory in the information of the accessed node.
  • the operating system may migrate the target process to the accessed node, so that after the migration, the target process can access the memory of the accessed node by using local access or proximity access. Thereby, the system performance can be effectively improved.
  • the operating system may not directly migrate the target process. After determining the remaining condition of the memory resource in the accessed node, when the remaining memory resource is sufficient to run the target process, the target process is migrated. To better improve system performance.
  • the method for the operating system to migrate the process may use an implementation similar to that in the prior art, and will not be repeated here.
  • the memory access method provided by the embodiment of the present invention, after receiving the monitoring information sent by the operating system, if the target controller monitors the target process occupying the monitored memory, the frequency of remote access to the accessed node's memory is greater than or equal to The threshold value, the information of the accessed node is sent to the operating system, so that the operating system migrates the target process to the accessed node; the remote memory access is converted into a local memory access or a neighboring memory access, thereby reducing the target The time that the process accesses the memory effectively improves the performance of the server system.
  • FIG. 2 is a flowchart of another memory access method according to an embodiment of the present invention. As shown in FIG. 2, the method includes:
  • NC chip receives the monitoring information sent by the operating system through the main board management control unit.
  • step 101 communication between the NC chip and the operating system is implemented by a motherboard management control unit disposed on the first node.
  • the NC chip in the first node when the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to the threshold, is accessed. Information of the accessed memory in the node is sent to the operating system by the mainboard management control unit.
  • the connection relationship between the nodes will be limited by the number of interfaces or interconnections on the nodes.
  • the communication protocol allows the nodes to be interconnected using a Node Controller (NC) chip.
  • NC Node Controller
  • Each node is provided with an NC chip, and the nodes can be communicably connected through respective NC chips.
  • the CPUs on the two nodes can realize high-speed interconnection through the NC chip, and the connections between the nodes are not required. It is then limited by the number of interfaces on the node.
  • the NC chip has a function of caching, such as causality checking and message forwarding.
  • the operating system can read and write to the registers on the NC chip on the node through the Baseboard Management Controller (BMC) on the node.
  • BMC Baseboard Management Controller
  • BMC supports industry-standard intelligent platform management interfaces (Intelligent Platform Management
  • IPMI IP Multimedia Interface
  • I P M I provides a variety of system interfaces
  • KCS Keyboard Controller Style
  • the operating system can communicate with the BMC through the KCS.
  • the BMC can access the registers of the NC chip through the Inter-Integrated Circuit (IIC) to perform read and write operations, thereby monitoring or controlling the state of the NC chip.
  • IIC Inter-Integrated Circuit
  • the NC chip on the first node monitors the monitored memory and monitors the access of the target process running on the monitored memory to the memory of other nodes.
  • the threshold is a ratio of the number of times the memory of the other node is accessed to the total number of CPUs in the first node in the process running on all the CPUs in the first node in a preset time.
  • the threshold can be set as needed, and a typical setting is as follows.
  • Presetting a time length in the first node, counting the processes running on all the CPUs connected to the NC chip on the first node in the preset time, and using the number of remote memory accesses, the number of times is included in other nodes.
  • Access to shared memory also includes access to non-shared memory in other nodes.
  • the ratio of the number of times to the number of CPUs on the first node is taken as the threshold. That is, If the NC chip of the first node monitors the target process running on the monitored memory, the number of remote memory accesses is greater than or equal to the average value of the remote memory access on a single CPU, indicating that the target process performs remote memory access. Visits are more frequent and the target process needs to be migrated.
  • the number of CPUs in a node is eight
  • the number of requests for remote memory access through the node's NC chip is 1000 times in a preset time, that is, all processes running on eight CPUs are performed in total. 10,000 remote memory accesses. It can be calculated that the average value of remote access per CPU within a preset time is 1 25 times.
  • the threshold is set to 1 25 .
  • the target process accesses the remote memory more frequently and needs to be migrated.
  • FIG. 3 is a schematic structural diagram of a node controller according to an embodiment of the present invention. As shown in FIG. 3, the node controller includes:
  • the receiving unit 1 1 is configured to receive monitoring information sent by the operating system, where the monitoring information carries information about the monitored memory in the first node to which the node controller belongs, where the monitored memory is the target process in the first a memory resource occupied by a node, the operating system running on each node in a server system consisting of at least two nodes including the first node, the target process is at the first node a process running on the central processing unit CPU and accessing the memory of the accessed node other than the first node in the server system;
  • the monitoring unit 1 2 is configured to: when the frequency of accessing the memory of the accessed node by the target process that monitors the monitored memory is greater than or equal to a threshold, send information of the accessed node to the An operating system, such that the operating system migrates the target process to the accessed node according to information of the accessed node.
  • the threshold value of the monitoring unit 12 for comparing the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is The ratio of the number of times the memory of the other node is accessed to the total number of CPUs in the first node in the processes running on all the CPUs in the first node within a preset time.
  • the receiving unit 1 1 is specifically configured to: Receiving, by the mainboard management control unit of the first node, the monitoring information sent by the operating system;
  • the monitoring unit 12 is specifically configured to:
  • the main board management control unit sends to the operating system.
  • the node controller includes:
  • the processor 21, the memory 22, the bus 23, and the communication interface 24 are connected by a bus 23 and communicate with each other.
  • the processor 21 may be a single core or multi-core central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more configured to implement the embodiments of the present invention. integrated circuit.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 11 can be a high speed RAM memory or a nonvolatile memory.
  • non-volatile memory such as at least one disk storage.
  • the memory 22 is used to store the program 221.
  • the program 221 may include program code, where the program code includes computer operation instructions.
  • the communication interface 24 is configured to receive monitoring information sent by the operating system, where the monitoring information carries information about the monitored memory in the first node to which the node controller belongs, where the monitored memory is the target process at the first node.
  • An occupied memory resource the operating system running on each node in a server system consisting of at least two nodes including the first node, where the target process is in the middle of the first node A process running on a processor CPU and accessing memory of a visited node other than the first node in the server system.
  • the processor 21 runs the program 221 to execute:
  • the operating system migrates the target process to the location according to the information of the accessed node Said access node.
  • the method for performing the memory access by the node controller provided by the embodiments of the present invention may be referred to the operation steps described in the foregoing method embodiments, and details are not described herein again.
  • FIG. 5 is a schematic structural diagram of a server system according to an embodiment of the present invention. As shown in FIG. 5, the server system includes at least two nodes, respectively, node 1 and node 2 including a node controller as shown in FIG. 3 or FIG. .
  • An operating system is run in the server system; if the operating system determines that the target process exists, the operating system sends monitoring information to the first node controller where the target process is running, where the monitoring information carries the node Information about the monitored memory in the node to which the first node controller belongs to the controller, the monitored memory is a memory resource occupied by the target process on the first node, and the target process is in the first node a process running on the central processing unit CPU and accessing the memory of the accessed node other than the first node in the server system;
  • the operating system migrates the target process to the accessed node.
  • the server system can implement access to the memory in the following manner.
  • the shared memory monitoring module When the operating system is running, if it is determined that a process has created shared memory, that is, the shared memory is used as monitored memory, the shared memory monitoring module is started.
  • the shared memory monitoring module is configured to determine whether the process accesses memory resources of other nodes other than the node to which it belongs; if yes, the operating system starts the OS information reporting module.
  • the OS information reporting module is configured to send monitoring information to the BMC.
  • the monitoring information includes a CPU number, a node number, and information of the monitored memory.
  • the CPU number is the unique identifier of the CPU in the entire server system;
  • the node number is the unique identifier of the node in the entire server system;
  • the information of the monitored memory is the description of the memory area that the NC chip needs to monitor, usually several groups of memory segments.
  • the start and end addresses of the address For example, 1 KB memory space, there may be 2 segments in physical memory addressing, one segment is 0x 0- 0x2 00, and the other segment is 0x400- 0x6 00, then the description letter
  • the information should be 2 sets of descriptions, ie 0x 0- 0x 200 and 0x4 00- 0x6 00.
  • the communication channel between the operating system and the BMC can be selected as a KCS or a single-block transmission (BT).
  • the OS information reporting module sends information to the BMC, and the BMC feeds back information and migrates to the operating system. Instructions, etc., are sent by KCS or BT.
  • the BMC After receiving the monitoring information, the BMC converts the monitoring information into the value of a specific register in the NC chip by using the conversion module, and writes the information into the monitoring target register of the NC chip through the register reading and writing module, and is monitored by the NC chip.
  • the NC chip monitors the frequency of access to the memory on other nodes by the process running on the monitored memory through the monitoring module of its network subsystem.
  • the feedback module alarm is started, and the CPU number of the monitored CPU and the monitored memory information are sent to the BMC, which is forwarded to the operating system by the BMC.
  • the monitoring register group is used to store the monitored CPU number and node number corresponding to the OS information reporting module
  • the memory segment register is used to store
  • the memory segment information of the memory may also be included, and may also include a target memory access counter for counting the number of times the process performs remote memory access; adding an alarm register for indicating whether migration is required; adding a feedback register for providing to the BMC
  • the CPU number, node number, and memory segment information corresponding to the process of the migration, and the content information indicated therein corresponds to the contents of the monitoring register.
  • the operating system can set the threshold stored in the threshold register through the BMC.
  • the NC chip can periodically compare the target memory access counter with the value in the threshold register to determine whether the number of times the process performs remote memory access exceeds the threshold.
  • the period of time can be set as needed, such as 60 seconds or 120 seconds, but the optional values are not limited to this.
  • the BMC reads the value in the register of the NC chip by timing. If it is determined that the alarm register is set, the value in the feedback register is read, and the value of the register is converted into the corresponding CPU number and memory section by the conversion module. Information, combined with instructions that need to be migrated, is sent to the operating system.
  • the time interval for the BMC to read the alarm register can be set as needed, for example, 60 seconds or 120 seconds, but the optional values are not limited to this.
  • the operating system determines which process needs to be migrated according to the received CPU number and the memory segment information, and implements the migration of the process through the process migration module.
  • the capability of the target process, and the memory resources of other CPUs in the accessed node have the ability to run the target process, and then migrate the target process to other CPUs of the accessed node.
  • the operating system may further determine whether the destination node, that is, the accessed node, has sufficient memory resources to be allocated to the target process.
  • the operating system may migrate the target process to the portion of memory of the destination node.
  • the migration may not be performed temporarily.
  • the memory may also be detected periodically, and when the remaining memory resources are sufficient, the operating system may migrate the target process to the portion of the memory of the destination node.
  • the operating system can also determine whether the remaining memory resources in the memory of other CPUs in the destination node are sufficient for allocation to the target process. If yes, the target process can be temporarily migrated to the partial memory, and the memory is periodically accessed. Detecting, when the remaining memory resources in the accessed memory are sufficient for allocation to the target process, the target process is migrated to the accessed memory.
  • the memory access method, device and system provided by the embodiments of the present invention can fully utilize the CPU and memory resources of the migration destination node through process migration, thereby improving resource utilization; and at the same time, due to process migration, processes in the node are If the CPU that has been moved out and the process is moved out has no other processes that need to be run, the energy consumption of the node can be reduced and energy can be saved.
  • the steps can be completed by the relevant hardware of the program instructions.
  • the aforementioned program can be stored in a computer readable storage medium. When the program is executed, the steps including the foregoing method embodiments are performed; and the foregoing storage medium includes: various media that can store program codes, such as ROM, RAM, disk or optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Provided are a memory access method, device and system. The memory access method comprises: a node controller receiving monitoring information sent by an operating system, wherein the monitoring information carries information about a monitored memory in a first node to which the node controller belongs, the monitored memory is a memory resource occupied by a target process on the first node, and the target process is a process which operates on a central processing unit (CPU) of the first node and accesses a memory of an accessed node other than the first node in a server system; if it is monitored that the frequency of access to the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to a threshold value, then sending information about the accessed node to the operating system, so as to migrate the target process to the accessed node according to the information about the accessed node; and converting the remote memory access into local memory access or adjacent memory access, so that the time it takes for the target process to access the memory can be reduced, thereby effectively improving the performance of the server system.

Description

内存访问方法、 装置及系统  Memory access method, device and system
技术领域 本发明实施例涉及计算机技术, 尤其涉及一种内存访问方法、 装置及 系统。 背景技术 随着计算机技术的发展, 服务器系统可以由一台或多台服务器组成, 每 台月良务器作为一个节点, 形成非一致性内存访问 (Non-Uniform Memory Access, NUMA ) 架构。 TECHNICAL FIELD Embodiments of the present invention relate to computer technologies, and in particular, to a memory access method, apparatus, and system. BACKGROUND OF THE INVENTION With the development of computer technology, a server system can be composed of one or more servers, and each server acts as a node to form a non-uniform memory access (NUMA) architecture.
在 N醫 A架构的服务器系统中,每个节点中可以包括一个或多个中央处理 单元 (Central Processing Unit, CPU) , 每个 CPU可以预先被配置一定的 内存资源。 CPU上运行的进程可以釆用如下三种方法对服务器系统中的内存 资源进行访问, 分另1 J为本地 ( local ) 内存访问、 4近( buddy ) 内存访问和 远端 ( remote ) 内存访问。 In the server system of the N medical A architecture, each node may include one or more central processing units (CPUs), and each CPU may be configured with a certain memory resource in advance. The process running on the CPU can use the following three methods to access the memory resources in the server system, and the other one is local (local) memory access, 4 near (buddy) memory access, and remote (remote) memory access.
CPU上运行的进程对该 CPU 自身的内存资源进行访问的方法, 即本地内 存访问; CPU上运行的进程对该 CPU所属节点内其他 CPU的内存资源进行访 问的方法, 即邻近内存访问; CPU上运行的进程对该 CPU所属节点以外的、 其他节点内的 CPU的内存资源进行访问的方法, 即远端内存访问。  The method of accessing the CPU's own memory resources by the process running on the CPU, that is, the local memory access; the method running on the CPU accesses the memory resources of other CPUs in the node to which the CPU belongs, that is, the adjacent memory access; The running process accesses the memory resources of the CPUs in other nodes other than the node to which the CPU belongs, that is, the remote memory access.
在这三种内存访问方法中, CPU上运行的进程进行远端内存访问所需的 时间较长, 可能为进行本地内存访问所需时间的 3-20倍, 因此, 现有服务器 系统中的 CPU上运行的进程在釆用远端内存访问的方法访问内存资源时, 会 导致系统的整体性能降低。 发明内容 本发明实施例提供一种内存访问方法、 装置及系统, 用于提高服务器系 统整体的性能。 本发明实施例的第一个方面是提供一种内存访问方法, 包括: 节点控制器接收操作系统发送的监控信息, 所述监控信息中携带有所述 节点控制器所属第一节点中被监控内存的信息, 所述被监控内存是目标进程 在所述第一节点上占用的内存资源, 所述操作系统运行在由包括所述第一节 点在内的至少两个节点组成的服务器系统中的每个节点上, 所述目标进程是 在所述第一节点的中央处理器 CPU上运行、 且对所述服务器系统中所述第一 节点之外的被访问节点的内存进行访问的进程; Among the three memory access methods, the process running on the CPU takes a long time to access the remote memory, which may be 3-20 times longer than the time required for local memory access. Therefore, the CPU in the existing server system When a running process accesses a memory resource by using a remote memory access method, the overall performance of the system is degraded. SUMMARY OF THE INVENTION Embodiments of the present invention provide a memory access method, apparatus, and system for improving overall performance of a server system. A first aspect of the embodiments of the present invention provides a memory access method, including: receiving, by a node controller, monitoring information sent by an operating system, where the monitoring information carries a monitored memory in a first node to which the node controller belongs The monitored memory is a memory resource occupied by the target process on the first node, and the operating system runs in a server system consisting of at least two nodes including the first node. And the target process is a process running on a central processing unit CPU of the first node and accessing a memory of the accessed node other than the first node in the server system;
所述节点控制器若监控到占用所述被监控内存的所述目标进程对所述被 访问节点的内存进行访问的频度大于或等于阔值, 则将所述被访问节点的信 息发送给所述操作系统, 以使所述操作系统根据所述被访问节点的信息将所 述目标进程迁移至所述被访问节点。  If the node controller monitors that the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to a threshold, sending the information of the accessed node to the An operating system, such that the operating system migrates the target process to the accessed node according to information of the accessed node.
结合第一个方面提供的内存访问方法, 在第一种可能的实现方式中, 所 述阔值为在预设时间内所述第一节点中全部 CPU上运行的进程中, 对其他节 点的内存进行访问的次数与所述第一节点中 CPU总数的比值。  With reference to the memory access method provided by the first aspect, in a first possible implementation manner, the threshold is a memory that is running on all CPUs in the first node in a preset time, and is in memory of other nodes. The ratio of the number of accesses to the total number of CPUs in the first node.
结合第一个方面或第一种可能的实现方式,在第二种可能的实现方式中, 所述节点控制器为节点控制器 NC芯片;  With reference to the first aspect or the first possible implementation manner, in a second possible implementation manner, the node controller is a node controller NC chip;
相应地, 所述节点控制器接收操作系统发送的监控信息包括:  Correspondingly, the receiving, by the node controller, the monitoring information sent by the operating system includes:
所述节点控制器 NC芯片通过所述第一节点的主板管理控制单元,接收所 述操作系统发送的所述监控信息;  The node controller NC chip receives the monitoring information sent by the operating system by using a motherboard management control unit of the first node;
相应地, 所述节点控制器若监控到占用所述被监控内存的所述目标进程 对所述被访问节点的内存进行访问的频度大于或等于阔值, 则将所述被访问 节点的信息发送给所述操作系统包括:  Correspondingly, if the node controller monitors that the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to a threshold, the information of the accessed node is Sending to the operating system includes:
所述节点控制器 NC芯片,若监控到占用所述被监控内存的所述目标进程 对所述被访问节点的内存进行访问的频度大于或等于所述阔值, 则将所述被 访问节点中被访问内存的信息, 通过所述主板管理控制单元发送给所述操作 系统。  The node controller NC chip, if the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to the threshold, the node being accessed The information of the accessed memory is sent to the operating system by the mainboard management control unit.
本发明实施例的第二个方面是提供一种节点控制器, 包括: 接收单元, 用于接收操作系统发送的监控信息, 所述监控信息中携带有 所述节点控制器所属第一节点中被监控内存的信息, 所述被监控内存是目标 进程在所述第一节点上占用的内存资源, 所述操作系统运行在由包括所述第 一节点在内的至少两个节点组成的服务器系统中的每个节点上, 所述目标进 程是在所述第一节点的中央处理器 CPU上运行、 且对所述服务器系统中所述 第一节点之外的被访问节点的内存进行访问的进程; A second aspect of the embodiments of the present invention provides a node controller, including: a receiving unit, configured to receive monitoring information sent by an operating system, where the monitoring information carries information about monitored memory in a first node to which the node controller belongs, where the monitored memory is a target process at the first node An occupied memory resource, the operating system running on each node in a server system consisting of at least two nodes including the first node, where the target process is in the middle of the first node a process running on the processor CPU and accessing memory of the accessed node other than the first node in the server system;
监控单元, 用于在监控到所述被监控内存的所述目标进程对所述被访问 节点的内存进行访问的频度大于或等于阔值, 则将所述被访问节点的信息发 送给所述操作系统, 以使所述操作系统根据所述被访问节点的信息将所述目 标进程迁移至所述被访问节点。  a monitoring unit, configured to send, when the frequency of accessing the memory of the accessed node by the target process that monitors the monitored memory is greater than or equal to a threshold, sending information of the accessed node to the An operating system, to cause the operating system to migrate the target process to the accessed node according to information of the accessed node.
结合第二个方面提供的节点控制器, 在第一种可能的实现方式中, 所述 阔值是在预设时间内所述第一节点中全部 CPU上运行的进程中, 对其他节点 的内存进行访问的次数与所述第一节点中 CPU总数的比值。  In conjunction with the node controller provided in the second aspect, in a first possible implementation manner, the threshold is a memory that is running on all CPUs in the first node in a preset time, and is in memory of other nodes. The ratio of the number of accesses to the total number of CPUs in the first node.
结合第二个方面或第一种可能的实现方式,在第二种可能的实现方式中, 所述接收单元具体用于:  With reference to the second aspect or the first possible implementation manner, in a second possible implementation manner, the receiving unit is specifically configured to:
通过所述第一节点的主板管理控制单元, 接收所述操作系统发送的所述 监控信息;  Receiving, by the mainboard management control unit of the first node, the monitoring information sent by the operating system;
所述监控单元具体用于:  The monitoring unit is specifically configured to:
在监控到占用所述被监控内存的所述目标进程对被访问节点的内存进行 访问的频度大于或等于所述阔值时,将所述被访问节点中被访问内存的信息, 通过所述主板管理控制单元发送给所述操作系统。  When the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to the threshold, the information of the accessed memory in the accessed node is The main board management control unit sends to the operating system.
本发明实施例的第三个方面是提供一种服务器系统, 包括至少两个包括 节点控制器的节点; 所述服务器系统中运行有操作系统;  A third aspect of the embodiments of the present invention provides a server system, including at least two nodes including a node controller; and an operating system running in the server system;
所述操作系统若判断出存在目标进程, 则向所述目标进程运行所在的第 一节点发送监控信息, 所述监控信息中携带有所述节点控制器所属的所述第 一节点中被监控内存的信息, 所述被监控内存是所述目标进程在所述第一节 点上占用的内存资源, 所述目标进程是在所述第一节点的中央处理器 CPU上 运行、 且对所述服务器系统中所述第一节点之外的被访问节点的内存进行访 问的进程; If the operating system determines that the target process exists, the monitoring information is sent to the first node where the target process is running, where the monitoring information carries the monitored memory in the first node to which the node controller belongs. The monitored memory is a memory resource occupied by the target process on the first node, and the target process is on a central processing unit CPU of the first node. a process running, and accessing memory of the accessed node other than the first node in the server system;
所述操作系统接收到所述第一节点发送的被所述目标进程访问的被访问 节点的信息之后, 将所述目标进程迁移至所述被访问节点。  After the operating system receives the information of the accessed node that is accessed by the target process by the first node, the operating system migrates the target process to the accessed node.
结合第三个方面提供的服务器系统, 在第一种可能的实现方式中, 所述 操作系统若判断出所述被访问节点中被访问内存所属的 CPU , 其内存资源具 有运行所述目标进程的能力, 则将所述目标进程迁移至所述被访问内存所属 的 CPU ; 若判断出所述被访问内存所属的 CPU , 其内存资源不具有运行所述目 标进程的能力, 且所述被访问节点中的其他 CPU的内存资源具有运行所述目 标进程的能力, 则将所述目标进程迁移至所述被访问节点的其他 CPU。  In conjunction with the server system provided by the third aspect, in a first possible implementation manner, if the operating system determines that the CPU of the accessed node belongs to the CPU to which the accessed memory belongs, the memory resource has the target process running Capabilities, the target process is migrated to the CPU to which the accessed memory belongs; if it is determined that the CPU to which the accessed memory belongs, the memory resource does not have the ability to run the target process, and the accessed node The memory resources of other CPUs in the CPU have the ability to run the target process, and then migrate the target process to other CPUs of the accessed node.
本发明实施例提供的内存访问方法、 装置及系统, 节点控制器在接收到 操作系统发送的监控信息之后, 若监控到占用被监控内存的进程对被访问节 点的内存进行远端访问的频度大于或等于阔值, 则将被访问节点的信息发送 给操作系统, 以使操作系统将目标进程迁移至所述被访问节点; 将远端内存 访问转换为本地内存访问或邻近内存访问, 从而能够减小目标进程访问内存 的时间, 有效地提高了服务器系统的性能。 附图说明 图 1为本发明实施例提供的内存访问方法的流程图;  The memory access method, device and system provided by the embodiment of the present invention, after receiving the monitoring information sent by the operating system, the node controller monitors the frequency of the remote access of the memory of the accessed node by the process occupying the monitored memory. If the value is greater than or equal to the threshold, the information of the accessed node is sent to the operating system, so that the operating system migrates the target process to the accessed node; and the remote memory access is converted into a local memory access or a neighboring memory access, thereby enabling Reduce the time that the target process accesses memory, effectively improving the performance of the server system. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a flowchart of a memory access method according to an embodiment of the present invention;
图 2为本发明实施例提供的另一内存访问方法的流程图;  2 is a flowchart of another memory access method according to an embodiment of the present invention;
图 3为本发明实施例提供的节点控制器的结构示意图;  3 is a schematic structural diagram of a node controller according to an embodiment of the present invention;
图 4为本发明实施例提供的另一节点控制器的结构示意图;  4 is a schematic structural diagram of another node controller according to an embodiment of the present invention;
图 5为本发明实施例提供的服务器系统的结构示意图。 具体实施方式 本发明各实施例为了提高进程在釆用远端内存访问的方式时, 服务器系 统的性能, 釆用了对满足一定条件的进程进行迁移的内存访问方法, 将远端 内存访问的方式转换为本地内存访问或邻近内存访问的方式。 FIG. 5 is a schematic structural diagram of a server system according to an embodiment of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In order to improve the performance of a server system when a remote memory access method is used in a process of the present invention, a memory access method for migrating a process that satisfies certain conditions is used. The way memory access is converted to local memory access or proximity memory access.
本发明各实施例可以应用在由至少两个节点组成的多服务器系统中, 各 节点中包括本发明各实施例所述的节点控制器。每个节点可以为一台服务器, 每台服务器上可以包括一个或多个 CPU , 每个 CPU可以对应地被分配有部分 内存资源。 操作系统( Opera t i ng Sys t em, OS )可以运行在该服务器系统中。  Embodiments of the present invention can be applied to a multi-server system composed of at least two nodes, each of which includes a node controller according to various embodiments of the present invention. Each node can be a server, and each server can include one or more CPUs, and each CPU can be allocated a part of memory resources correspondingly. The operating system (Operational Edition) can be run on the server system.
图 1为本发明实施例提供的内存访问方法的流程图, 如图 1所示, 该方 法包括:  FIG. 1 is a flowchart of a memory access method according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
1 01、 节点控制器接收操作系统发送的监控信息。  1 01. The node controller receives the monitoring information sent by the operating system.
其中, 所述监控信息中携带有所述节点控制器所属第一节点中被监控内 存的信息, 所述被监控内存是目标进程在所述第一节点上占用的内存资源, 所述操作系统运行在由包括所述第一节点在内的至少两个节点组成的服务器 系统中的每个节点上, 所述目标进程是在所述第一节点的中央处理器 CPU上 运行、 且对所述服务器系统中所述第一节点之外的被访问节点的内存进行访 问的进程。  The monitoring information carries information about the monitored memory in the first node to which the node controller belongs, where the monitored memory is a memory resource occupied by the target process on the first node, and the operating system runs. On each of the server systems consisting of at least two nodes including the first node, the target process is run on a central processor CPU of the first node, and the server is The process of accessing the memory of the accessed node other than the first node in the system.
具体的, 操作系统运行在服务器系统的各个节点上, 每个节点的 CPU上 均可能运行着一个或多个进程, 这些进程可以釆用本地内存访问、 邻近内存 访问或远端内存访问进行内存访问。  Specifically, the operating system runs on each node of the server system, and each node may run one or more processes on the CPU, and the processes may use local memory access, adjacent memory access, or remote memory access for memory access. .
本发明各实施例主要针对进行远端内存访问的进程, 因此本发明各实施 例中所述的 "目标进程" 即指代进行远端内存访问的进程。 也就是说, 目标 进程对其所运行于的 CPU所属的节点以外的、 其他节点中的内存资源进行访 问。 本发明各实施例中所述的 "第一节点" 即作为执行主体的节点控制器所 属的节点, 这样命名是为了与其他节点进行区分, 以避免混淆。  The embodiments of the present invention are mainly directed to the process of performing remote memory access. Therefore, the "target process" described in the embodiments of the present invention refers to the process of performing remote memory access. That is, the target process accesses memory resources in other nodes other than the node to which the CPU it is running belongs to. The "first node" described in the embodiments of the present invention is a node to which the node controller of the execution subject belongs, so as to be distinguished from other nodes to avoid confusion.
操作系统能够从全局的角度对各节点以及各节点中运行的进程进行管 理, 因此操作系统能够获知各节点中的进程的运行情况。 例如, 节点中有哪 些进程在运行, 这些进程运行在哪个 CPU上, 占用的是哪个 CPU的哪部分内 存。 操作系统能够获知各进程对于内存进行访问的情况, 包括进程访问的是 哪个节点的哪个 CPU上的哪部分内存资源。 因此, 当进程进行远端内存访问 时, 操作系统能够获知这一情况。 The operating system can manage each node and the processes running in each node from a global perspective, so the operating system can know the running status of the processes in each node. For example, which processes are running on the node, which CPU the processes are running on, and which part of the memory is occupied by which CPU. The operating system can know the status of each process accessing the memory, including which part of the memory of which node the process accesses. Therefore, when the process performs remote memory access The operating system is aware of this situation.
操作系统在获知服务器系统所运行的进程中, 存在目标进程时, 向运行 该目标进程的第一节点上的节点控制器发送监控信息。 其中, 监控信息中包 括该目标进程在该第一节点上占用的内存资源的信息, 本发明各实施例中将 该部分内存资源称为 "被监控内存" 。  The operating system sends monitoring information to the node controller on the first node running the target process when it knows that there is a target process in the process running by the server system. The monitoring information includes information about the memory resources occupied by the target process on the first node. In the embodiments of the present invention, the part of the memory resources is referred to as “monitored memory”.
1 02、节点控制器在监控到占用所述被监控内存的所述目标进程对被访问 节点的内存进行访问的频度大于或等于阔值时, 将所述被访问节点的信息发 送给所述操作系统。 执行步骤 1 02是为了使所述操作系统根据所述被访问节 点的信息, 将所述目标进程迁移至所述被访问节点。  When the node controller monitors that the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to a threshold, the information of the accessed node is sent to the operating system. Step 102 is performed to enable the operating system to migrate the target process to the accessed node according to the information of the accessed node.
具体的, 节点控制器在接收到监控信息之后, 对监控信息中所指示的被 监控内存进行监控。 第一节点能够获知其上所运行的进程访问内存的情况。 节点控制器可以不直接对进程进行监控, 由于运行在第一节点上的进程将在 本地占用所运行于的 CPU上的部分内存, 因此节点控制器可以对这部分内存 进行监控。  Specifically, after receiving the monitoring information, the node controller monitors the monitored memory indicated in the monitoring information. The first node is able to know the situation in which the process running on it accesses the memory. The node controller may not directly monitor the process, because the process running on the first node will locally occupy part of the memory on the running CPU, so the node controller can monitor this part of the memory.
并且, 当该第一节点上的进程对其他节点上的内存进行访问时, 节点控 制器也能够获知其他节点上被访问的内存的信息, 本发明各实施例中将目标 进程远端访问的内存所属的节点, 称为 "被访问节点" ; 将位于其他节点上 的、 被目标进程访问的内存资源, 称为 "被访问内存" 。  Moreover, when the process on the first node accesses the memory on the other node, the node controller can also learn the information of the accessed memory on the other node, and the memory accessed by the target process in the embodiments of the present invention. The node to which it belongs is called the "accessed node"; the memory resource that is located on other nodes and accessed by the target process is called "accessed memory".
节点控制器对被监控内存中所运行的进程, 对其他节点的内存进行访问 的情况进行监控。 具体可以为监控目标进程对其他节点的内存进行访问的频 度。  The node controller monitors the progress of the processes running in the monitored memory and the memory of other nodes. Specifically, it can monitor the frequency at which the target process accesses the memory of other nodes.
节点控制器中可以预设统计周期和阔值。 节点控制器监控每个统计周期 内, 目标节点对被访问节点的内存进行访问的次数, 获得频度值。 节点控制 器若判断出该频度值大于或等于该阔值, 则说明该目标进程当前主要釆用远 端内存访问的方式。 为了提高服务器系统的性能, 在这样的情况下, 节点控 制器将被访问节点的信息发送给操作系统。 其中, 被访问节点的信息中可以 包括, 被访问节点中被目标进程访问的被访问内存的信息。 但是, 由于操作 系统能够获知目标进程进行远端访问时所访问的内存资源是哪些, 因此在被 访问节点的信息中携带被访问内存的信息并不是必须的。 The statistical period and threshold can be preset in the node controller. The node controller monitors the number of times the target node accesses the memory of the accessed node during each statistical period, and obtains the frequency value. If the node controller determines that the frequency value is greater than or equal to the threshold, it indicates that the target process currently uses the remote memory access mode. In order to improve the performance of the server system, in this case, the node controller sends the information of the accessed node to the operating system. The information of the accessed node may include information of the accessed memory accessed by the target process in the accessed node. However, due to operation The system can know which memory resources are accessed when the target process performs remote access. Therefore, it is not necessary to carry the information of the accessed memory in the information of the accessed node.
操作系统接收到该被访问节点的信息之后, 可以将目标进程迁移到该被 访问节点, 从而使得迁移之后, 目标进程能够釆用本地访问或邻近访问的方 式, 对被访问节点的内存进行访问, 从而能够有效地提高系统性能。  After receiving the information of the accessed node, the operating system may migrate the target process to the accessed node, so that after the migration, the target process can access the memory of the accessed node by using local access or proximity access. Thereby, the system performance can be effectively improved.
进一步地, 操作系统也可以不直接对目标进程进行迁移, 在对被访问节 点中的内存资源的剩余情况进行判断之后, 当剩余的内存资源足够运行该目 标进程时, 再对该目标进程进行迁移, 以更好地提高系统性能。  Further, the operating system may not directly migrate the target process. After determining the remaining condition of the memory resource in the accessed node, when the remaining memory resource is sufficient to run the target process, the target process is migrated. To better improve system performance.
其中, 操作系统对进程进行迁移的方法, 可以釆用与现有技术中类似的 实现方式, 此次不再赘述。  The method for the operating system to migrate the process may use an implementation similar to that in the prior art, and will not be repeated here.
本发明实施例提供的内存访问方法, 节点控制器在接收到操作系统发送 的监控信息之后, 若监控到占用被监控内存的目标进程对被访问节点的内存 进行远端访问的频度大于或等于阔值, 则将被访问节点的信息发送给操作系 统, 以使操作系统将目标进程迁移至所述被访问节点; 将远端内存访问转换 为本地内存访问或邻近内存访问, 从而能够减小目标进程访问内存的时间, 有效地提高了服务器系统的性能。  The memory access method provided by the embodiment of the present invention, after receiving the monitoring information sent by the operating system, if the target controller monitors the target process occupying the monitored memory, the frequency of remote access to the accessed node's memory is greater than or equal to The threshold value, the information of the accessed node is sent to the operating system, so that the operating system migrates the target process to the accessed node; the remote memory access is converted into a local memory access or a neighboring memory access, thereby reducing the target The time that the process accesses the memory effectively improves the performance of the server system.
图 2为本发明实施例提供的另一内存访问方法的流程图, 如图 2所示, 该方法包括:  2 is a flowchart of another memory access method according to an embodiment of the present invention. As shown in FIG. 2, the method includes:
201、 第一节点中的节点控制器 NC芯片通过主板管理控制单元, 接收操 作系统发送的监控信息。  201. Node controller in the first node The NC chip receives the monitoring information sent by the operating system through the main board management control unit.
具体的, 在步骤 1 01中所述实现方式的基础上, NC芯片与操作系统之间 的通信, 是通过在第一节点上所设置的主板管理控制单元实现的。  Specifically, based on the implementation manner in step 101, communication between the NC chip and the operating system is implemented by a motherboard management control unit disposed on the first node.
202、 第一节点中的 NC芯片在监控到占用所述被监控内存的所述目标进 程对所述被访问节点的内存进行访问的频度大于或等于所述阔值时, 将所述 被访问节点中被访问内存的信息, 通过所述主板管理控制单元发送给所述操 作系统。  202. The NC chip in the first node, when the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to the threshold, is accessed. Information of the accessed memory in the node is sent to the operating system by the mainboard management control unit.
具体的, 可以参见步骤 1 02中所述的实现方式。 进一步地, 在 NUMA架构的服务器系统中, 当节点的数量较多时, 如果釆 用直接互连的方式对节点进行连接, 则节点之间的连接关系将受限于节点上 的接口数量或互连通信协议, 因此可以釆用节点控制器 (Node Controller, NC)芯片将各个节点进行互连。 每个节点上设置有一个 NC芯片, 节点之间分 别通过各自的 NC芯片能够通信连接,分别属于两个节点上的 CPU之间可以通 过 NC芯片实现高速互连,并且节点之间的连接不需要再受限于节点上的接口 数量。 NC芯片具有緩存 (Cache) —致性检查和报文转发等功能。 For details, refer to the implementation manner described in step 102. Further, in the server system of the NUMA architecture, when the number of nodes is large, if the nodes are connected by direct interconnection, the connection relationship between the nodes will be limited by the number of interfaces or interconnections on the nodes. The communication protocol allows the nodes to be interconnected using a Node Controller (NC) chip. Each node is provided with an NC chip, and the nodes can be communicably connected through respective NC chips. The CPUs on the two nodes can realize high-speed interconnection through the NC chip, and the connections between the nodes are not required. It is then limited by the number of interfaces on the node. The NC chip has a function of caching, such as causality checking and message forwarding.
操作系统可以通过节点上的主板管理控制单元 (Baseboard Management Controller, BMC ) , 对节点上的 NC芯片上的寄存器进行读写操作。 BMC支 持行业标准的智能平台管理接口 ( Intelligent Platform Management The operating system can read and write to the registers on the NC chip on the node through the Baseboard Management Controller (BMC) on the node. BMC supports industry-standard intelligent platform management interfaces (Intelligent Platform Management
Interface, IPMI )规范。 该规范描述了已经内置到主板上的管理功能, 包括 本地和远程诊断、 控制台支持、 配置管理、 硬件管理和故障排除等。 IPMI是 一项应用于服务器管理系统设计的开放的免费标准。 I P M I提供多种系统接 口, 键盘控制器方式 (Keyboard Controller Style, KCS ) 是目前使用最广 泛的 IPMI系统接口。 操作系统可以通过 KCS与 BMC进行通信。 BMC可以通过 集成电路总线 ( Inter- Integrated Circuit, IIC )访问 NC芯片的寄存器, 进行读写操作, 从而实现对 NC芯片状态的监控或控制。 Interface, IPMI) specification. This specification describes the management features already built into the motherboard, including local and remote diagnostics, console support, configuration management, hardware management, and troubleshooting. IPMI is an open, free standard for server management system design. I P M I provides a variety of system interfaces, and the Keyboard Controller Style (KCS) is the most widely used IPMI system interface. The operating system can communicate with the BMC through the KCS. The BMC can access the registers of the NC chip through the Inter-Integrated Circuit (IIC) to perform read and write operations, thereby monitoring or controlling the state of the NC chip.
第一节点上的 NC芯片对被监控内存进行监控,并且对被监控内存上所运 行的目标进程对其他节点的内存进行访问的情况, 进行监控。  The NC chip on the first node monitors the monitored memory and monitors the access of the target process running on the monitored memory to the memory of other nodes.
进一步地, 所述阔值为在预设时间内所述第一节点中全部 CPU上运行的 进程中, 对其他节点的内存进行访问的次数与所述第一节点中 CPU总数的比 值。  Further, the threshold is a ratio of the number of times the memory of the other node is accessed to the total number of CPUs in the first node in the process running on all the CPUs in the first node in a preset time.
具体的, 阔值可以根据需要被设定, 一种典型的设置方式如下。  Specifically, the threshold can be set as needed, and a typical setting is as follows.
第一节点中预设一个时间长度,统计预设时间内该第一节点上与 NC芯片 互连的全部 CPU上运行的进程, 釆用远端内存访问的次数, 该次数既包括对 其他节点中的共享内存的访问, 也包括对其他节点中的非共享内存的访问。  Presetting a time length in the first node, counting the processes running on all the CPUs connected to the NC chip on the first node in the preset time, and using the number of remote memory accesses, the number of times is included in other nodes. Access to shared memory also includes access to non-shared memory in other nodes.
将该次数与该第一节点上的 CPU的数量的比值,作为该阔值。也就是说, 第一节点的 NC芯片如果监控到被监控内存上运行的目标进程,进行远端内存 访问的次数, 大于或等于单个 CPU上进行远端内存访问的平均值, 则说明该 目标进程进行远端内存访问较为频繁, 该目标进程需要被迁移。 The ratio of the number of times to the number of CPUs on the first node is taken as the threshold. That is, If the NC chip of the first node monitors the target process running on the monitored memory, the number of remote memory accesses is greater than or equal to the average value of the remote memory access on a single CPU, indicating that the target process performs remote memory access. Visits are more frequent and the target process needs to be migrated.
举例说明, 例如节点中的 CPU的数量为 8个, 预设时间内通过该节点的 NC芯片进行远端内存访问的请求数为 1 000次, 即 8个 CPU上运行的所有进 程共进行了 1 000次远端内存访问。从而可计算出,每个 CPU在预设时间内对 远端访问的平均值为 1 25次。  For example, for example, the number of CPUs in a node is eight, and the number of requests for remote memory access through the node's NC chip is 1000 times in a preset time, that is, all processes running on eight CPUs are performed in total. 10,000 remote memory accesses. It can be calculated that the average value of remote access per CPU within a preset time is 1 25 times.
从而, 将阔值设定为 1 25。 当目标进程在统计周期内进行远端内存访问 的次数大于或等于 1 25 , 则说明该目标进程对远端内存的访问较为频繁, 需 要被迁移。  Thus, the threshold is set to 1 25 . When the number of remote memory accesses in the statistics period is greater than or equal to 1 25, the target process accesses the remote memory more frequently and needs to be migrated.
图 3为本发明实施例提供的节点控制器的结构示意图, 如图 3所示, 该 节点控制器包括:  FIG. 3 is a schematic structural diagram of a node controller according to an embodiment of the present invention. As shown in FIG. 3, the node controller includes:
接收单元 1 1 , 用于接收操作系统发送的监控信息, 所述监控信息中携带 有所述节点控制器所属第一节点中被监控内存的信息, 所述被监控内存是目 标进程在所述第一节点上占用的内存资源, 所述操作系统运行在由包括所述 第一节点在内的至少两个节点组成的服务器系统中的每个节点上, 所述目标 进程是在所述第一节点的中央处理器 CPU上运行、 且对所述服务器系统中所 述第一节点之外的被访问节点的内存进行访问的进程;  The receiving unit 1 1 is configured to receive monitoring information sent by the operating system, where the monitoring information carries information about the monitored memory in the first node to which the node controller belongs, where the monitored memory is the target process in the first a memory resource occupied by a node, the operating system running on each node in a server system consisting of at least two nodes including the first node, the target process is at the first node a process running on the central processing unit CPU and accessing the memory of the accessed node other than the first node in the server system;
监控单元 1 2 , 用于在监控所述被监控内存的所述目标进程对所述被访问 节点的内存进行访问的频度大于或等于阔值, 则将所述被访问节点的信息发 送给所述操作系统, 以使所述操作系统根据所述被访问节点的信息将所述目 标进程迁移至所述被访问节点。  The monitoring unit 1 2 is configured to: when the frequency of accessing the memory of the accessed node by the target process that monitors the monitored memory is greater than or equal to a threshold, send information of the accessed node to the An operating system, such that the operating system migrates the target process to the accessed node according to information of the accessed node.
进一步地,所述监控单元 1 2中用于与占用所述被监控内存的所述目标进 程对所述被访问节点的内存进行访问的频度进行比较的所述阔值, 所述阔值 是在预设时间内所述第一节点中全部 CPU上运行的进程中, 对其他节点的内 存进行访问的次数与所述第一节点中 CPU总数的比值。  Further, the threshold value of the monitoring unit 12 for comparing the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory, the threshold is The ratio of the number of times the memory of the other node is accessed to the total number of CPUs in the first node in the processes running on all the CPUs in the first node within a preset time.
进一步地, 所述接收单元 1 1具体用于: 通过所述第一节点的主板管理控制单元, 接收所述操作系统发送的所述 监控信息; Further, the receiving unit 1 1 is specifically configured to: Receiving, by the mainboard management control unit of the first node, the monitoring information sent by the operating system;
所述监控单元 12具体用于:  The monitoring unit 12 is specifically configured to:
在监控到占用所述被监控内存的所述目标进程对被访问节点的内存进行 访问的频度大于或等于所述阔值时,将所述被访问节点中被访问内存的信息, 通过所述主板管理控制单元发送给所述操作系统。  When the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to the threshold, the information of the accessed memory in the accessed node is The main board management control unit sends to the operating system.
图 4为本发明实施例提供的另一节点控制器的结构示意图,如图 4所示, 该节点控制器包括:  4 is a schematic structural diagram of another node controller according to an embodiment of the present invention. As shown in FIG. 4, the node controller includes:
处理器 21、 存储器 22、 总线 23和通信接口 24。 处理器 21、 存储器 22 和通信接口 24之间通过总线 23连接并完成相互间的通信。  The processor 21, the memory 22, the bus 23, and the communication interface 24. The processor 21, the memory 22 and the communication interface 24 are connected by a bus 23 and communicate with each other.
处理器 21可能为单核或多核中央处理单元 ( Central Processing Unit, CPU ) , 或者为特定集成电路 ( Application Specif ic Integrated Circuit, ASIC) , 或者为被配置成实施本发明实施例的一个或多个集成电路。  The processor 21 may be a single core or multi-core central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more configured to implement the embodiments of the present invention. integrated circuit.
存储器 11可以为高速 RAM存储器, 也可以为非易失性存储器  The memory 11 can be a high speed RAM memory or a nonvolatile memory.
( non-volatile memory ) , 例如至少一个磁盘存储器。  (non-volatile memory), such as at least one disk storage.
存储器 22用于存放程序 221。 具体的, 程序 221中可以包括程序代码, 所述程序代码包括计算机操作指令。  The memory 22 is used to store the program 221. Specifically, the program 221 may include program code, where the program code includes computer operation instructions.
通信接口 24用于接收操作系统发送的监控信息,所述监控信息中携带有 所述节点控制器所属第一节点中被监控内存的信息, 所述被监控内存是目标 进程在所述第一节点上占用的内存资源, 所述操作系统运行在由包括所述第 一节点在内的至少两个节点组成的服务器系统中的每个节点上, 所述目标进 程是在所述第一节点的中央处理器 CPU上运行、 且对所述服务器系统中所述 第一节点之外的被访问节点的内存进行访问的进程。  The communication interface 24 is configured to receive monitoring information sent by the operating system, where the monitoring information carries information about the monitored memory in the first node to which the node controller belongs, where the monitored memory is the target process at the first node. An occupied memory resource, the operating system running on each node in a server system consisting of at least two nodes including the first node, where the target process is in the middle of the first node A process running on a processor CPU and accessing memory of a visited node other than the first node in the server system.
处理器 21运行程序 221, 以执行:  The processor 21 runs the program 221 to execute:
在监控到所述被监控内存的所述目标进程对所述被访问节点的内存进行 访问的频度大于或等于阔值, 则将所述被访问节点的信息发送给所述操作系 统, 以使所述操作系统根据所述被访问节点的信息将所述目标进程迁移至所 述被访问节点。 Sending information of the accessed node to the operating system, so that the frequency of accessing the memory of the accessed node by the target process of the monitored memory is greater than or equal to a threshold The operating system migrates the target process to the location according to the information of the accessed node Said access node.
具体的, 本发明各实施例提供的节点控制器进行内存访问的方法, 可以 参见上述对应的方法实施例中所述的操作步骤, 此处不再赘述。  Specifically, the method for performing the memory access by the node controller provided by the embodiments of the present invention may be referred to the operation steps described in the foregoing method embodiments, and details are not described herein again.
图 5为本发明实施例提供的服务器系统的结构示意图, 如图 5所示, 该 服务器系统包括至少两个节点, 分别为包括如图 3或图 4所述节点控制器的 节点 1和节点 2。  FIG. 5 is a schematic structural diagram of a server system according to an embodiment of the present invention. As shown in FIG. 5, the server system includes at least two nodes, respectively, node 1 and node 2 including a node controller as shown in FIG. 3 or FIG. .
所述服务器系统中运行有操作系统; 所述操作系统若判断出存在目标进 程, 则向所述目标进程所运行于的第一节点控制器发送监控信息, 所述监控 信息中携带有所述节点控制器所属第一节点控制器所属节点中被监控内存的 信息, 所述被监控内存是所述目标进程在所述第一节点上占用的内存资源, 所述目标进程是在所述第一节点的中央处理器 CPU上运行、 且对所述服务器 系统中所述第一节点之外的被访问节点的内存进行访问的进程;  An operating system is run in the server system; if the operating system determines that the target process exists, the operating system sends monitoring information to the first node controller where the target process is running, where the monitoring information carries the node Information about the monitored memory in the node to which the first node controller belongs to the controller, the monitored memory is a memory resource occupied by the target process on the first node, and the target process is in the first node a process running on the central processing unit CPU and accessing the memory of the accessed node other than the first node in the server system;
所述操作系统接收到所述第一节点发送的被所述目标进程访问的被访问 节点的信息之后, 将所述目标进程迁移至所述被访问节点。  After the operating system receives the information of the accessed node that is accessed by the target process by the first node, the operating system migrates the target process to the accessed node.
具体的, 可以参见步骤 1 01 - 1 02 , 或者 2 01 -2 02中所述的实现方式。 进一步地, 在实际应用中服务器系统可以釆用如下方式, 实现对内存的 访问。  For details, refer to the implementations described in steps 1 01 - 1 02 or 2 01 - 02. Further, in practical applications, the server system can implement access to the memory in the following manner.
操作系统在运行时, 如果判断出有进程创建了共享内存, 即将该共享内 存作为被监控内存, 即启动共享内存监控模块。 共享内存监控模块, 用于判 断该进程是否对在其所属的节点以外的、 其他节点的内存资源进行访问; 若 是, 则操作系统启动 OS信息上报模块。 OS信息上报模块, 用于将监控信息 发送给 BMC。  When the operating system is running, if it is determined that a process has created shared memory, that is, the shared memory is used as monitored memory, the shared memory monitoring module is started. The shared memory monitoring module is configured to determine whether the process accesses memory resources of other nodes other than the node to which it belongs; if yes, the operating system starts the OS information reporting module. The OS information reporting module is configured to send monitoring information to the BMC.
其中, 监控信息包括 CPU号、 节点号和被监控内存的信息。 CPU号是 CPU 在整个服务器系统中的唯一标识; 节点号是节点在整个服务器系统中的唯一 标识; 被监控内存的信息 NC芯片需要监控的内存区域的描述信息,通常为若 干组内存区段物理地址的起始和结束地址。 例如, 1 KB内存空间, 在物理内 存编址可能有 2段, 一段是 0x 0- 0x2 00 , 另一段是 0x400- 0x6 00 , 那么描述信 息应当是 2组描述即 0x 0- 0x 200和 0x4 00- 0x6 00。 The monitoring information includes a CPU number, a node number, and information of the monitored memory. The CPU number is the unique identifier of the CPU in the entire server system; the node number is the unique identifier of the node in the entire server system; the information of the monitored memory is the description of the memory area that the NC chip needs to monitor, usually several groups of memory segments. The start and end addresses of the address. For example, 1 KB memory space, there may be 2 segments in physical memory addressing, one segment is 0x 0- 0x2 00, and the other segment is 0x400- 0x6 00, then the description letter The information should be 2 sets of descriptions, ie 0x 0- 0x 200 and 0x4 00- 0x6 00.
操作系统与 BMC的通信通道可选择 KCS或单块传输方式 ( One-B l ock Trans f e r , BT ); 在监控流程中, OS信息上报模块向 BMC发送信息, 以及 BMC 向操作系统反馈信息和迁移指令等, 均通过 KCS或 BT方式发送。  The communication channel between the operating system and the BMC can be selected as a KCS or a single-block transmission (BT). In the monitoring process, the OS information reporting module sends information to the BMC, and the BMC feeds back information and migrates to the operating system. Instructions, etc., are sent by KCS or BT.
BMC接收到监控信息之后, 利用其转换模块将该监控信息转换为 NC芯片 中特定寄存器的值,并通过寄存器读写模块将这些信息写入 NC芯片的监控目 标寄存器, 由 NC芯片进行监控。  After receiving the monitoring information, the BMC converts the monitoring information into the value of a specific register in the NC chip by using the conversion module, and writes the information into the monitoring target register of the NC chip through the register reading and writing module, and is monitored by the NC chip.
NC芯片会根据这些寄存器信息, 通过其网络子系统的监控模块, 对被监 控内存上运行的进程对其他节点上的内存进行访问的频度, 进行监控。 当该 频度超过阔值时, 启动反馈模块告警, 并将被监控 CPU的 CPU号和被监控内 存的信息发送给 BMC , 由 BMC转发给操作系统。  Based on these register information, the NC chip monitors the frequency of access to the memory on other nodes by the process running on the monitored memory through the monitoring module of its network subsystem. When the frequency exceeds the threshold, the feedback module alarm is started, and the CPU number of the monitored CPU and the monitored memory information are sent to the BMC, which is forwarded to the operating system by the BMC.
其中, 在 NC芯片中需要增加阔值寄存器, 用于存放阔值; 增加监控寄存 器组, 用于存放与 OS信息上报模块对应的被监控的 CPU号和节点号, 以及利 用内存区段寄存器存放被监控内存的内存区段信息, 还可以包括目标内存访 问计数器, 用于统计进程进行远端内存访问的次数; 增加告警寄存器, 用于 指示是否需要进行迁移; 增加反馈寄存器, 用于提供给 BMC需要进行迁移的 进程对应的 CPU号、 节点号和内存区段信息, 其指示的内容信息与监控寄存 器内容对应。  Among them, in the NC chip, it is necessary to increase the threshold register for storing the threshold; the monitoring register group is used to store the monitored CPU number and node number corresponding to the OS information reporting module, and the memory segment register is used to store The memory segment information of the memory may also be included, and may also include a target memory access counter for counting the number of times the process performs remote memory access; adding an alarm register for indicating whether migration is required; adding a feedback register for providing to the BMC The CPU number, node number, and memory segment information corresponding to the process of the migration, and the content information indicated therein corresponds to the contents of the monitoring register.
另外, 操作系统通过 BMC可以对阔值寄存器中所存放的阔值进行设置。 NC芯片可以定期地对目标内存访问计数器与阔值寄存器中的数值进行比对, 以判断进程进行远端内存访问的次数是否超过阔值。 该定期的时间长短可以 根据需要进行设置, 例如 60秒或 1 2 0秒等, 但是可选的数值并不仅限于此。  In addition, the operating system can set the threshold stored in the threshold register through the BMC. The NC chip can periodically compare the target memory access counter with the value in the threshold register to determine whether the number of times the process performs remote memory access exceeds the threshold. The period of time can be set as needed, such as 60 seconds or 120 seconds, but the optional values are not limited to this.
BMC通过定时对 NC芯片的寄存器中的数值进行读取, 若判断出告警寄存 器被设置, 则读取反馈寄存器中的数值, 并通过转换模块将寄存器的数值转 换成对应的 CPU号和内存区段信息, 并结合需要迁移的指令, 发送给操作系 统。 BMC读取告警寄存器的时间间隔可以根据需要进行设定,例如 6 0秒或 1 2 0 秒等, 但是可选的数值并不仅限于此。 操作系统在接收到迁移指令之后 ,根据接收到的 CPU号和内存区段信息 , 判断出需要对哪个进程进行迁移, 并通过进程迁移模块实现进程的迁移。 The BMC reads the value in the register of the NC chip by timing. If it is determined that the alarm register is set, the value in the feedback register is read, and the value of the register is converted into the corresponding CPU number and memory section by the conversion module. Information, combined with instructions that need to be migrated, is sent to the operating system. The time interval for the BMC to read the alarm register can be set as needed, for example, 60 seconds or 120 seconds, but the optional values are not limited to this. After receiving the migration instruction, the operating system determines which process needs to be migrated according to the received CPU number and the memory segment information, and implements the migration of the process through the process migration module.
进一步地, 所述操作系统若判断出所述被访问节点中被访问内存所属的 Further, if the operating system determines that the accessed memory in the accessed node belongs to
CPU , 其内存资源具有运行所述目标进程的能力, 则将所述目标进程迁移至所 述被访问内存所属的 CPU; 若判断出所述被访问内存所属的 CPU , 其内存资源 不具有运行所述目标进程的能力, 且所述被访问节点中的其他 CPU的内存资 源具有运行所述目标进程的能力, 则将所述目标进程迁移至所述被访问节点 的其他 CPU。 a CPU having a memory resource having the capability of running the target process, and migrating the target process to a CPU to which the accessed memory belongs; if it is determined that the CPU to which the accessed memory belongs, the memory resource does not have a running location The capability of the target process, and the memory resources of other CPUs in the accessed node have the ability to run the target process, and then migrate the target process to other CPUs of the accessed node.
具体的, 在实际应用中, 操作系统在接收到迁移指令之后, 还可以进一 步地判断目的节点, 即被访问节点, 是否具有足够的内存资源可分配给该目 标进程。  Specifically, in an actual application, after receiving the migration instruction, the operating system may further determine whether the destination node, that is, the accessed node, has sufficient memory resources to be allocated to the target process.
若目的节点中被访问内存中的剩余内存资源足够用于分配给该目标进 程, 即具有运行所述目标进程的能力, 则操作系统可将目标进程迁移至该目 的节点的该部分内存中。  If the remaining memory resources in the accessed memory in the destination node are sufficient for allocation to the target process, i.e., the ability to run the target process, the operating system may migrate the target process to the portion of memory of the destination node.
若目的节点中被访问内存中的剩余内存资源不足够用于分配给该目标进 程, 即不具有运行所述目标进程的能力, 则可以暂时不进行迁移。 也可以继 续定时对该部分内存进行检测, 当剩余内存资源足够时, 操作系统可将目标 进程迁移至该目的节点的该部分内存中。  If the remaining memory resources in the accessed memory in the destination node are not sufficient for allocation to the target process, that is, the capability to run the target process, the migration may not be performed temporarily. The memory may also be detected periodically, and when the remaining memory resources are sufficient, the operating system may migrate the target process to the portion of the memory of the destination node.
操作系统也可以判断目的节点中其他 CPU的内存中的剩余内存资源, 是 否足够用于分配给该目标进程, 如果是, 则可以暂时将目标进程迁移至该部 分内存, 并继续定时对被访问内存进行检测, 当被访问内存中的剩余内存资 源足够用于分配给该目标进程时, 将目标进程迁移至该被访问内存中。  The operating system can also determine whether the remaining memory resources in the memory of other CPUs in the destination node are sufficient for allocation to the target process. If yes, the target process can be temporarily migrated to the partial memory, and the memory is periodically accessed. Detecting, when the remaining memory resources in the accessed memory are sufficient for allocation to the target process, the target process is migrated to the accessed memory.
本发明各实施例提供的内存访问方法、 装置及系统, 通过进程迁移, 可 以将迁移目的节点的 CPU、 内存资源充分利用起来, 提高了资源的利用率; 同时由于进程迁移, 节点中的进程若被迁出, 且有进程被迁出的 CPU无其他 需要运行的进程, 则可以降低该节点的能耗, 实现节能。  The memory access method, device and system provided by the embodiments of the present invention can fully utilize the CPU and memory resources of the migration destination node through process migration, thereby improving resource utilization; and at the same time, due to process migration, processes in the node are If the CPU that has been moved out and the process is moved out has no other processes that need to be run, the energy consumption of the node can be reduced and energy can be saved.
本领域普通技术人员可以理解: 实现上述各方法实施例的全部或部分步 骤可以通过程序指令相关的硬件来完成。 前述的程序可以存储于一计算机可 读取存储介质中。 该程序在执行时, 执行包括上述各方法实施例的步骤; 而 前述的存储介质包括: R0M、 RAM, 磁碟或者光盘等各种可以存储程序代码的 介质。 One of ordinary skill in the art can understand that all or part of the steps of the foregoing method embodiments are implemented. The steps can be completed by the relevant hardware of the program instructions. The aforementioned program can be stored in a computer readable storage medium. When the program is executed, the steps including the foregoing method embodiments are performed; and the foregoing storage medium includes: various media that can store program codes, such as ROM, RAM, disk or optical disk.
最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这些修改或者替换, 并 不使相应技术方案的本质脱离本发明各实施例技术方案的范围。  Finally, it should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting thereof; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

权 利 要 求 Rights request
1、 一种内存访问方法, 其特征在于, 包括: 1. A memory access method, characterized by including:
节点控制器接收操作系统发送的监控信息,所述监控信息中携带有所述 节点控制器所属第一节点中被监控内存的信息,所述被监控内存是目标进程 在所述第一节点上占用的内存资源,所述操作系统运行在由包括所述第一节 点在内的至少两个节点组成的服务器系统中的每个节点上,所述目标进程是 在所述第一节点的中央处理器 CPU上运行、且对所述服务器系统中所述第一 节点之外的被访问节点的内存进行访问的进程; The node controller receives monitoring information sent by the operating system. The monitoring information carries information about the monitored memory in the first node to which the node controller belongs. The monitored memory is occupied by the target process on the first node. memory resources, the operating system runs on each node in the server system consisting of at least two nodes including the first node, and the target process is the central processor of the first node A process running on the CPU and accessing the memory of an accessed node other than the first node in the server system;
所述节点控制器若监控到占用所述被监控内存的所述目标进程对所述 被访问节点的内存进行访问的频度大于或等于阔值,则将所述被访问节点的 信息发送给所述操作系统,以使所述操作系统根据所述被访问节点的信息将 所述目标进程迁移至所述被访问节点。 If the node controller monitors that the frequency of accessing the memory of the accessed node by the target process occupying the monitored memory is greater than or equal to a threshold, then the information of the accessed node is sent to the accessed node. The operating system is configured to enable the operating system to migrate the target process to the visited node according to the information of the visited node.
2、 根据权利要求 1所述的内存访问方法, 其特征在于, 所述阔值为在 预设时间内所述第一节点中全部 CPU上运行的进程中,对其他节点的内存进 行访问的次数与所述第一节点中 CPU总数的比值。 2. The memory access method according to claim 1, wherein the threshold value is the number of times the memory of other nodes is accessed by processes running on all CPUs in the first node within a preset time. The ratio to the total number of CPUs in the first node.
3、 根据权利要求 1或 2所述的内存访问方法, 其特征在于, 所述节点 控制器为节点控制器 NC芯片; 3. The memory access method according to claim 1 or 2, characterized in that the node controller is a node controller NC chip;
相应地, 所述节点控制器接收操作系统发送的监控信息包括: 所述节点控制器 NC芯片通过所述第一节点的主板管理控制单元, 接收 所述操作系统发送的所述监控信息; Correspondingly, the node controller receiving the monitoring information sent by the operating system includes: the node controller NC chip receives the monitoring information sent by the operating system through the mainboard management control unit of the first node;
相应地,所述节点控制器若监控到占用所述被监控内存的所述目标进程 对所述被访问节点的内存进行访问的频度大于或等于阔值,则将所述被访问 节点的信息发送给所述操作系统包括: Correspondingly, if the node controller monitors that the frequency of the target process occupying the monitored memory accessing the memory of the accessed node is greater than or equal to the threshold, the information of the accessed node will be Sent to the operating system includes:
所述节点控制器 NC芯片, 若监控到占用所述被监控内存的所述目标进 程对所述被访问节点的内存进行访问的频度大于或等于所述阔值,则将所述 被访问节点中被访问内存的信息,通过所述主板管理控制单元发送给所述操 作系统。 The node controller NC chip monitors that the frequency of the target process occupying the monitored memory accessing the memory of the accessed node is greater than or equal to the threshold, then the accessed node is The information of the accessed memory is sent to the operating system through the mainboard management control unit.
4、 一种节点控制器, 其特征在于, 包括: 4. A node controller, characterized by including:
接收单元, 用于接收操作系统发送的监控信息, 所述监控信息中携带有 所述节点控制器所属第一节点中被监控内存的信息,所述被监控内存是目标 进程在所述第一节点上占用的内存资源,所述操作系统运行在由包括所述第 一节点在内的至少两个节点组成的服务器系统中的每个节点上,所述目标进 程是在所述第一节点的中央处理器 CPU上运行、且对所述服务器系统中所述 第一节点之外的被访问节点的内存进行访问的进程; A receiving unit, configured to receive monitoring information sent by the operating system. The monitoring information carries information about the monitored memory in the first node to which the node controller belongs. The monitored memory is the location of the target process on the first node. The operating system runs on each node in the server system consisting of at least two nodes including the first node, and the target process is in the center of the first node. A process running on the processor CPU and accessing the memory of an accessed node other than the first node in the server system;
监控单元,用于在监控到所述被监控内存的所述目标进程对所述被访问 节点的内存进行访问的频度大于或等于阔值,则将所述被访问节点的信息发 送给所述操作系统,以使所述操作系统根据所述被访问节点的信息将所述目 标进程迁移至所述被访问节点。 A monitoring unit configured to send information about the accessed node to the accessed node when it is monitored that the target process of the monitored memory accesses the memory of the accessed node at a frequency greater than or equal to a threshold. Operating system, so that the operating system migrates the target process to the visited node according to the information of the visited node.
5、 根据权利要求 4所述的节点控制器, 其特征在于, 所述阔值是在预 设时间内所述第一节点中全部 CPU上运行的进程中,对其他节点的内存进行 访问的次数与所述第一节点中 CPU总数的比值。 5. The node controller according to claim 4, wherein the threshold value is the number of times the memory of other nodes is accessed by processes running on all CPUs in the first node within a preset time. The ratio to the total number of CPUs in the first node.
6、 根据权利要求 4或 5所述的节点控制器, 其特征在于, 所述接收单 元具体用于: 6. The node controller according to claim 4 or 5, characterized in that the receiving unit is specifically used for:
通过所述第一节点的主板管理控制单元,接收所述操作系统发送的所述 监控信息; Receive the monitoring information sent by the operating system through the mainboard management control unit of the first node;
所述监控单元具体用于: The monitoring unit is specifically used for:
在监控到占用所述被监控内存的所述目标进程对被访问节点的内存进 行访问的频度大于或等于所述阔值时,将所述被访问节点中被访问内存的信 息, 通过所述主板管理控制单元发送给所述操作系统。 When it is monitored that the frequency of the target process occupying the monitored memory accessing the memory of the accessed node is greater than or equal to the threshold, the information of the accessed memory in the accessed node is passed through the The mainboard management control unit sends it to the operating system.
7、 一种服务器系统, 其特征在于, 包括至少两个包括如权利要求 4-6 中任一所述的节点控制器的节点; 所述服务器系统中运行有操作系统; 所述操作系统若判断出存在目标进程,则向所述目标进程运行所在的第 一节点发送监控信息,所述监控信息中携带有所述节点控制器所属的所述第 一节点中被监控内存的信息,所述被监控内存是所述目标进程在所述第一节 点上占用的内存资源,所述目标进程是在所述第一节点的中央处理器 CPU上 运行、且对所述服务器系统中所述第一节点之外的被访问节点的内存进行访 问的进程; 7. A server system, characterized in that it includes at least two nodes including the node controller according to any one of claims 4-6; an operating system runs in the server system; if the operating system determines If the target process exists, the monitoring information is sent to the first node where the target process runs. The monitoring information carries the information of the monitored memory in the first node to which the node controller belongs. Monitoring memory is the target process described in Section 1 The memory resources occupied by the node, the target process is a process running on the central processing unit CPU of the first node and accessing the memory of the accessed node other than the first node in the server system ;
所述操作系统接收到所述第一节点发送的被所述目标进程访问的被访 问节点的信息之后, 将所述目标进程迁移至所述被访问节点。 After receiving the information of the visited node sent by the target process from the first node, the operating system migrates the target process to the visited node.
8、 根据权利要求 7所述的服务器系统, 其特征在于, 所述操作系统若 判断出所述被访问节点中被访问内存所属的 CPU , 其内存资源具有运行所述 目标进程的能力, 则将所述目标进程迁移至所述被访问内存所属的 CPU ; 若 判断出所述被访问内存所属的 CPU , 其内存资源不具有运行所述目标进程的 能力,且所述被访问节点中的其他 CPU的内存资源具有运行所述目标进程的 能力, 则将所述目标进程迁移至所述被访问节点的其他 CPU。 8. The server system according to claim 7, wherein if the operating system determines that the memory resource of the CPU to which the accessed memory in the accessed node belongs has the ability to run the target process, it will The target process migrates to the CPU to which the accessed memory belongs; if it is determined that the CPU to which the accessed memory belongs does not have the memory resources to run the target process, and other CPUs in the accessed node If the memory resource has the ability to run the target process, the target process will be migrated to other CPUs of the visited node.
PCT/CN2014/071252 2013-06-25 2014-01-23 Memory access method, device and system WO2014206078A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310257057.5 2013-06-25
CN201310257057.5A CN103365717B (en) 2013-06-25 2013-06-25 Memory pool access method, Apparatus and system

Publications (1)

Publication Number Publication Date
WO2014206078A1 true WO2014206078A1 (en) 2014-12-31

Family

ID=49367142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/071252 WO2014206078A1 (en) 2013-06-25 2014-01-23 Memory access method, device and system

Country Status (2)

Country Link
CN (1) CN103365717B (en)
WO (1) WO2014206078A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365717B (en) * 2013-06-25 2016-08-31 华为技术有限公司 Memory pool access method, Apparatus and system
CN104917784B (en) * 2014-03-10 2018-06-05 华为技术有限公司 A kind of data migration method, device and computer system
CN104035823B (en) * 2014-06-17 2018-06-26 华为技术有限公司 Load-balancing method and device
CN104516952B (en) * 2014-12-12 2018-02-13 华为技术有限公司 A kind of memory partitioning dispositions method and device
CN106708551B (en) * 2015-11-17 2020-01-17 华为技术有限公司 Configuration method and system for CPU (central processing unit) of hot-adding CPU (central processing unit)
CN105590063B (en) * 2015-12-25 2019-03-22 珠海豹趣科技有限公司 A kind of method, apparatus and electronic equipment for excavating loophole
CN106020971B (en) * 2016-05-10 2020-01-31 广东睿江云计算股份有限公司 CPU scheduling method and device in cloud host system
CN107577530A (en) * 2016-07-04 2018-01-12 中兴通讯股份有限公司 Board, the method and system of balanced board memory usage
CN113626214B (en) * 2021-07-16 2024-02-09 浪潮电子信息产业股份有限公司 Information transmission method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269390B1 (en) * 1996-12-17 2001-07-31 Ncr Corporation Affinity scheduling of data within multi-processor computer systems
CN102369511A (en) * 2011-09-01 2012-03-07 华为技术有限公司 Resource removing method, device and system
CN102984762A (en) * 2012-12-12 2013-03-20 中国联合网络通信集团有限公司 Method and device for function allocation of IMS
CN103036959A (en) * 2012-12-07 2013-04-10 武汉邮电科学研究院 Realization method and realization system of distributed deployment application program based on input/output (IO) decoupling
CN103365717A (en) * 2013-06-25 2013-10-23 华为技术有限公司 Memory access method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269390B1 (en) * 1996-12-17 2001-07-31 Ncr Corporation Affinity scheduling of data within multi-processor computer systems
CN102369511A (en) * 2011-09-01 2012-03-07 华为技术有限公司 Resource removing method, device and system
CN103036959A (en) * 2012-12-07 2013-04-10 武汉邮电科学研究院 Realization method and realization system of distributed deployment application program based on input/output (IO) decoupling
CN102984762A (en) * 2012-12-12 2013-03-20 中国联合网络通信集团有限公司 Method and device for function allocation of IMS
CN103365717A (en) * 2013-06-25 2013-10-23 华为技术有限公司 Memory access method, device and system

Also Published As

Publication number Publication date
CN103365717B (en) 2016-08-31
CN103365717A (en) 2013-10-23

Similar Documents

Publication Publication Date Title
WO2014206078A1 (en) Memory access method, device and system
US11487661B2 (en) Converged memory device and operation method thereof
CN107992436B (en) NVMe data read-write method and NVMe equipment
US8095701B2 (en) Computer system and I/O bridge
US20240195707A1 (en) Technologies for managing cache quality of service
JP4702127B2 (en) Virtual computer system, physical resource reconfiguration method and program thereof
US7747881B2 (en) System and method for limiting processor performance
US10762137B1 (en) Page table search engine
US12058036B2 (en) Technologies for quality of service based throttling in fabric architectures
KR102092660B1 (en) Cpu and multi-cpu system management method
JP2009075718A (en) Method of managing virtual i/o path, information processing system, and program
US10810133B1 (en) Address translation and address translation memory for storage class memory
CN117873931B (en) Data processing system, method and medium
JPWO2015079528A1 (en) Computer system and computer system control method
CN115495433A (en) Distributed storage system, data migration method and storage device
US10705985B1 (en) Integrated circuit with rate limiting
US10754789B1 (en) Address translation for storage class memory in a system that includes virtual machines
JP5500272B2 (en) Relay device, relay history recording method, and data processing device
WO2019223445A1 (en) Hard disk read-write control method and apparatus, electronic device and storage medium
US20170262209A1 (en) Memory-driven out-of-band management
US8938588B2 (en) Ensuring forward progress of token-required cache operations in a shared cache
CN116401043A (en) Execution method of computing task and related equipment
JP4431492B2 (en) Data transfer unit that supports multiple coherency granules
US10051087B2 (en) Dynamic cache-efficient event suppression for network function virtualization
JP4692501B2 (en) Data storage system and data storage method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14817143

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14817143

Country of ref document: EP

Kind code of ref document: A1