CN118279126B

CN118279126B - Graphics processing unit video memory processing method, server, product, equipment and medium

Info

Publication number: CN118279126B
Application number: CN202410696331.7A
Authority: CN
Inventors: 刘俊; 李霞; 王彦伟
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2024-05-31
Filing date: 2024-05-31
Publication date: 2024-08-30
Anticipated expiration: 2044-05-31
Also published as: CN118279126A

Abstract

The invention discloses a method, a server, a product, equipment and a medium for processing a video memory of a graphic processing unit, and relates to the technical field of graphic processing units. Under the condition that the video memory of the graphic processing unit does not meet the required memory request, the physical pages with the access frequency smaller than the preset times in the video memory of the graphic processing unit are migrated to the host memory, so that the availability and the utilization rate of the graphic processing unit are improved. By recording the access frequency of the physical pages, the access condition of each physical page can be accurately determined, and then the first physical page in the video memory of the graphic processing unit is accurately moved to the memory of the host; the host and the graphic processing unit move physical pages by calculating a fast link protocol and based on the condition of access frequency, so that the upper-layer non-perceived memory scheduling exchange of the video memory of the graphic processing unit and the host memory is realized, the universality of the video memory processing of the graphic processing unit is improved, and the processing requirements of different AI models are met.

Description

Graphics processing unit video memory processing method, server, product, equipment and medium

Technical Field

The present invention relates to the field of graphics processing units, and in particular, to a graphics processing unit video memory processing method, a server, a product, a device, and a medium.

Background

In the context of current large models, there is an increasing demand for computational resources, where the demand for video memory is very urgent. Large-scale models require a large amount of graphics processing unit (Graphics Processing Unit, GPU) memory to be consumed, both in training and reasoning. In order to meet the requirement of the large model for the video memory of the graphics processing unit, the use number of the graphics processing unit is directly enlarged to meet the requirement of the model, but the cost and the energy consumption are increased, so that the use of the memory of a central processing unit (Central Processing Unit, CPU) to replace the video memory of part of the graphics processing unit is proposed.

However, in the scenario of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) model training, the existing schemes for performing data exchange between the graphics processing unit video memory and the central processor memory are all based on the interconnection protocol of the peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIE), so that the efficiency is low, the effect of PCIE performance is easy, and the data to be used recently by the model is loaded to the graphics processing unit video memory from the central processor.

Therefore, the present invention provides a processing method for the video memory of the graphics processing unit, so as to solve the problem of limited available video memory capacity of the graphics processing unit, and improve the video memory utilization rate of the graphics processing unit, and the training requirements applicable to different AI models are technical problems that need to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a method, a server, a product, equipment and a medium for processing a video memory of a graphic processing unit, which are used for solving the technical problems that the available video memory capacity of the graphic processing unit is limited, the video memory utilization rate of the graphic processing unit is low, and the existing method for solving the limited available video memory capacity of the graphic processing unit cannot be suitable for different AI models and lacks versatility.

In order to solve the technical problems, the invention provides a method for processing a video memory of a graphic processing unit, which is applied to the graphic processing unit, wherein the graphic processing unit is connected with a host, the graphic processing unit and the host both comprise computing fast link interfaces, and the graphic processing unit comprises a recorder which is respectively connected with an equipment end consistency engine, an equipment end memory management unit and an equipment end page table buffer; the method comprises the following steps:

acquiring a memory request required for processing the artificial intelligent model; wherein the memory request comprises a request sent to the video memory of the graphic processing unit and/or a request sent to the memory of the host;

Under the condition that the video memory capacity of the graphic processing unit does not meet the preset requirement, acquiring the access frequency of each physical page recorded by the recorder; the physical page comprises a memory page of the host accessed by the graphic processing unit recorded in the equipment-side consistency engine, an equipment-side physical memory page recorded by the equipment-side memory management unit and a physical page searched by the equipment-side page table buffer;

Determining a first physical page in a video memory of the graphic processing unit according to the access frequency of each physical page recorded by the recorder; the first physical page is a physical page with the access frequency smaller than the preset times;

And moving a first physical page positioned in the video memory of the graphic processing unit to the memory of the host computer through a computing fast link protocol so as to complete the response to the memory request.

In one aspect, recording, by the recorder, the access frequency of each physical page includes:

storing access frequency of each physical page in the recorder according to a bitmap data structure;

After recording the access frequency of each physical page by the recorder, the method further comprises:

Storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure through a first protocol; wherein the first protocol is a cache protocol in the computing fast link protocol.

In another aspect, the method further comprises:

Under the condition that the video memory capacity of the graphic processing unit meets the preset requirement is detected, determining a second physical page of the host according to the frequency of the graphic processing unit accessing the memory page of the host recorded in the recorder; the second physical page is a physical page with the access frequency being greater than or equal to the preset times;

And migrating the second physical page of the host to the video memory of the graphic processing unit through the first protocol.

In another aspect, before the moving the first physical page located in the video memory of the gpu to the memory of the host by calculating the fast link protocol and/or before the migrating the second physical page of the host to the video memory of the gpu by calculating the first protocol, the method further includes:

Obtaining a first request and a second request sent to the host; the first request is a request for representing accessing the memory of the host, and the second request is a request for representing synchronizing data of the access frequency of each physical page recorded in the recorder to the memory of the host;

the first request and the second request are sent in packets to a computing fast link interface in the graphics processing unit.

In another aspect, the method further comprises:

And under the condition that the video memory of the graphic processing unit does not contain the first physical page according to the access frequency of each physical page recorded by the recorder, accessing the memory of the host through the first protocol, and accessing the memory page of the host through the graphic processing unit recorded in the equipment-side consistency engine.

In another aspect, the moving the first physical page located in the video memory of the graphics processing unit to the memory of the host by calculating a fast link protocol includes:

Obtaining the residual memory capacity of the graphic processing unit after a preset number of first physical pages are to be migrated from all first physical pages in the memory of the graphic processing unit; the preset number is smaller than the number of all first physical pages in the video memory of the graphic processing unit;

judging whether the residual video memory capacity meets the preset requirement or not;

if not, returning to the step of obtaining the residual video memory capacity of the graphic processing unit after the preset number of first physical pages are to be migrated from all the first physical pages in the video memory of the graphic processing unit;

If yes, each first physical page to be migrated is moved to the memory of the host through the first protocol.

In another aspect, the moving each first physical page to be migrated to the memory of the host through the first protocol includes:

recording information of each first physical page to be migrated from the beginning of judging whether the residual video memory capacity meets the preset requirement or not until the residual video memory capacity meets the preset requirement;

And after judging that the residual capacity meets the preset requirement, simultaneously moving each first physical page to be migrated to the memory of the host through the first protocol according to the recorded information of each first physical page to be migrated.

On the other hand, determining the preset number of first physical pages to be migrated from all the first physical pages in the video memory of the graphics processing unit includes:

acquiring the priority order of all first physical pages in a video memory of the graphic processing unit;

selecting the first physical pages with the preset number from all the first physical pages in the video memory of the graphic processing unit according to the priority order of the first physical pages;

And taking the first physical pages with the preset number as the first physical pages to be migrated.

In another aspect, after the moving the first physical page located in the video memory of the graphics processing unit to the memory of the host through the computing fast link protocol, the method further includes:

Accessing a host memory by a direct memory access mode of a second protocol under the condition that the access of a first physical page in the memory migrated from the graphic processing unit to the host is detected; wherein the second protocol is a command protocol in the computing fast link protocol.

On the other hand, after the host memory is accessed by the direct memory access mode of the second protocol, the method further comprises:

Recording the access frequency of a first physical page in the memory migrated from the graphic processing unit to the host through the recorder, and detecting whether the video memory capacity of the graphic processing unit meets the preset requirement;

If yes, returning to the step of acquiring the access frequency of each physical page recorded by the recorder;

And if not, returning to the step of determining a second physical page of the host according to the frequency of the graphic processing unit accessing the memory page of the host recorded in the recorder.

In another aspect, the graphics processing unit further includes a calculate fast link command register; the storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure through a first protocol includes:

Receiving a command sent by the host computer and used for representing and synchronizing the access frequency of each physical page in the memory of the host computer by using a second protocol and through the computing quick link command register; wherein the second protocol is a command protocol in the computing fast link protocol;

and according to the command received by the calculation quick link command register and through the first protocol, storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure.

In order to solve the technical problem, the invention also provides a server, which comprises a host and a graphic processing unit, wherein the graphic processing unit is connected with the host, the graphic processing unit and the host both comprise a computing fast link interface, and the graphic processing unit comprises a recorder which is respectively connected with an equipment end consistency engine, an equipment end memory management unit and an equipment end page table buffer;

The graphic processing unit is used for acquiring a memory request required by processing the artificial intelligent model; wherein the memory request comprises a request sent to a video memory of the graphic processing unit and/or a request sent to a memory of the host; under the condition that the video memory capacity of the graphic processing unit does not meet the preset requirement, acquiring the access frequency of each physical page recorded by the recorder; the physical page comprises a memory page of the host accessed by the graphic processing unit recorded in the equipment-side consistency engine, an equipment-side physical memory page recorded by the equipment-side memory management unit and a physical page searched by the equipment-side page table buffer; determining a first physical page in a video memory of the graphic processing unit according to the access frequency of each physical page recorded by the recorder, wherein the first physical page is a physical page with the access frequency smaller than the preset times; and moving a first physical page positioned in the video memory of the graphic processing unit to the memory of the host computer through a computing fast link protocol so as to complete the response to the memory request.

On the other hand, the graphic processing unit is also used for sending a data reading request to the equipment end consistency engine through the accelerator computing core; transmitting a data reading request to a host agent through a device-side consistency engine; receiving data sent by the host agent through a device-side consistency engine, and sending the data sent by the host agent to an accelerator computing core;

Wherein the host agent sending data to the device-side coherence engine comprises: the host agent sends inquiry data to the central processing unit whether the data are in the central processing unit cache or not, if yes, the host agent receives the data in the central processing unit cache and sends the data in the central processing unit cache to the equipment end consistency engine; if not, after the data in the host memory is loaded into the central processor cache, the host agent receives the data in the central processor cache and sends the data in the central processor cache to the equipment-side consistency engine.

In another aspect, the graphics processing unit further includes a calculate fast link command register; the host comprises a page scheduler; the computing fast link command register is respectively connected with the recorder and the page scheduler;

the computing fast link command register sends a command for representing the access frequency of the physical page to the recorder under the condition of receiving the command for representing the access frequency of the physical page, which is sent by the host, so that the recorder can record the access frequency of the physical page; and/or under the condition that a command for representing paging is received, wherein the command is sent by the host, an instruction is sent to the paging scheduler through a command protocol in the computing fast link protocol, so that the paging scheduler can conveniently conduct paging.

In order to solve the above technical problem, the present invention further provides a computer program product, which includes a computer program/instruction, and the computer program/instruction implements the steps of the graphics processing unit video memory processing method when executed by a processor.

In order to solve the technical problem, the present invention further provides a graphics processing unit video memory processing device, including:

A memory for storing a computer program;

and the processor is used for realizing the steps of the graphics processing unit video memory processing method when executing the computer program.

In order to solve the above technical problem, the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program implements the steps of the graphics processing unit video memory processing method described above when executed by a processor.

The invention provides a video memory processing method applied to a graphic processing unit, which comprises the steps of obtaining a memory request required by processing an artificial intelligent model, and obtaining the access frequency of each physical page recorded by a recorder under the condition that the video memory capacity of the graphic processing unit is detected to not meet the preset requirement; and determining a first physical page in the video memory of the graphic processing unit (namely, a physical page with the access frequency smaller than the preset times) according to the access frequency of each physical page recorded by the recorder, and moving the first physical page in the video memory of the graphic processing unit to the memory of the host computer through calculating a fast link protocol so as to complete the response to the memory request.

The method has the advantages that firstly, when the memory request required by the artificial intelligent model is processed, under the condition that the video memory of the graphic processing unit does not meet the required memory request, physical pages with the access frequency smaller than the preset times in the video memory of the graphic processing unit are migrated to the host memory, so that the problem of limited capacity of the graphic processing unit is solved, and the availability and the utilization rate of the graphic processing unit are improved. Secondly, the graphic processing unit and the host computer both comprise computing fast link interfaces, the graphic processing unit comprises a recorder which is respectively connected with the equipment end consistency engine, the equipment end memory management unit and the equipment end page table buffer, and physical pages which can be recorded by the recorder comprise memory pages of the host computer accessed by the graphic processing unit recorded in the equipment end consistency engine, equipment end physical memory pages recorded by the equipment end memory management unit and searched physical pages recorded by the equipment end page table buffer, namely, the access frequency of the physical pages of the graphic processing unit and the access frequency of the physical pages of the host computer are recorded in the recorder, so that the access condition of each physical page can be accurately determined according to the access frequency, and further, the first physical page positioned in the display memory of the graphic processing unit can be accurately moved to the memory of the host computer; thirdly, as the physical page is moved between the host and the graphic processing unit by calculating a fast link protocol and based on the condition of access frequency, rather than a mode of loading data which is needed by a model recently into a video memory of the graphic processing unit, the video memory of the graphic processing unit is exchanged with upper-layer non-perceived memory scheduling of a host memory, the universality of a video memory processing method of the graphic processing unit is improved, and the method is suitable for processing requirements of different AI models; furthermore, the graphic processing unit and the host interact through calculating a fast link protocol, so that the scale and the training efficiency of AI model training are improved compared with the mode of interaction through adopting a PCIE interconnection protocol; and, training of the model is performed in the graphic processing unit, so that the pressure of the host is relieved.

In addition, the recorder records the access frequency of each physical page and synchronizes the access frequency to the memory of the host, so that the occupation of the memory of the graphic processing unit is reduced, the backup of the access frequency data is realized, and the safety of the data is improved; and the access frequency of each physical page is stored in the recorder and the host memory by the bitmap data structure, so that the occupied storage space is reduced.

Under the condition that the video memory capacity of the graphic processing unit meets the preset requirement is detected, namely the video memory capacity of the graphic processing unit is enough, a second physical page (a physical page with the access frequency being more than or equal to the preset times) in the host is migrated to the graphic processing unit, so that performance jitter generated by access data switching is avoided as much as possible, and performance ping-pong phenomenon is prevented.

And the request for accessing the memory of the host and the request for characterizing the data of the access frequency of each physical page recorded in the synchronous recorder to the memory of the host are packaged and sent to the computing fast link interface, so that the efficiency of protocol transmission is improved.

The method comprises the steps of obtaining the residual video memory capacity of the graphic processing unit after a preset number of first physical pages are to be migrated in all first physical pages in the video memory of the graphic processing unit by calculating, determining the number of the first physical pages to be migrated according to the residual video memory capacity, and compared with the method that all the first physical pages are moved to a host, reducing the data quantity to be migrated and improving the performance of the graphic processing unit; and when the first physical page is moved, the first physical page is uniformly moved to the host, so that the influence of multiple times of movement on the performance of the graphic processing unit is avoided.

After a first physical page in a video memory of the graphic processing unit is moved to a memory of a host through a computing fast link protocol, the memory access frequency of the host is recorded, so that the subsequent movement of the physical page according to the access frequency of the physical page is facilitated, and the usability of the graphic processing unit is ensured.

In addition, the invention also provides a server, a product, a video memory processing device of the graphic processing unit and a computer readable storage medium, which have the same or corresponding technical characteristics and effects as the video memory processing method of the graphic processing unit.

Drawings

For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a schematic diagram of a graphics processing unit and a host according to a fast link protocol of the present invention;

FIG. 2 is a schematic diagram of a graphic processing unit accessing memory data on a host side by calculating a fast link protocol according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for processing a video memory of a graphics processing unit according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating interaction of a page migration scheduling component according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of a request multiplexing interface according to an embodiment of the present invention;

FIG. 6 is a flowchart of a method for requesting memory on a GPU according to an embodiment of the present invention;

FIG. 7 is a block diagram of a GPU memory processing device according to another embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

The core of the invention is to provide a method, a server, a product, equipment and a medium for processing the video memory of a graphic processing unit, so as to solve the technical problems that the available video memory capacity of the graphic processing unit is limited, the video memory utilization rate of the graphic processing unit is low, and the existing method for solving the limited available video memory capacity of the graphic processing unit cannot be suitable for different AI models and lacks versatility.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. In order to solve the problem that the available memory capacity of the graphics processing unit is limited, in the embodiment of the invention, a graphics processing unit architecture design supporting a computing fast link (Compute Express Link, CXL) is first established, so that efficient communication can be performed between the central processor and the graphics processing unit through a computing fast link protocol.

After the system is started, the graphic processing unit video memory and the central processing unit memory are uniformly addressed, and the application program can use the uniform memory address to access the memory data. In general, there are two management modes for the memory in the device conforming to the computing fast link protocol, one is a host bias mode, in which the memory of the device is managed by the host, and the device accesses the own memory to request the approval of the host, and the other is a device bias mode, in which the device manages the own memory without approval by the host. The problem considered by the invention is that the graphic processing unit in AI model training is insufficient in video memory, the main use scene is that the equipment accesses the own memory, if the host bias mode is adopted, the graphic processing unit video memory is managed by the host, the graphic processing unit needs to obtain the permission of the host for accessing the own memory, thus causing access delay, and the problem is not existed in the equipment bias mode, so the embodiment of the invention mainly considers the equipment which accords with the calculation fast link protocol in the equipment bias mode.

Fig. 1 is a schematic diagram of a graphics processing unit and a host according to a computing fast link protocol according to an embodiment of the present invention, as shown in fig. 1, data transmission is performed between the host 1 and the graphics processing unit 2 (i.e., an accelerator) through the computing fast link protocol. It should be noted that the computing fast link protocol includes 3 sub-protocols, namely, a cache protocol (cxl.cache), a command protocol (cxl.io), and a memory protocol (cxl.mem). Devices based on different fast link sub-protocol combinations are of 3 types, wherein the second type of CXL device supports all three sub-protocols.

The second type of device, CXL in fig. 1 is exemplified as interacting with a host. The host 1 comprises a central processor core, a computing fast link interface, a host agent, a host memory management unit and a host page table buffer; the host agent, host memory management unit and host page table buffer are all connected to the central processor core, and the host agent is connected to a computing fast link interface located in the host 1. The computing fast link interface in the host 1 is responsible for receiving and transmitting computing fast link protocol data; the host agent is responsible for managing the consistency of the CPU memory and the GPU memory, and solves the conflict problem of the CPU core (including the cache) and the GPU core (including the cache) for accessing the system memory; the host memory management unit (Memory Management Unit, MMU) is responsible for managing the mapping of application memory virtual addresses to physical addresses; the page table cache (Translation Lookaside Buffer, TLB) is responsible for fast lookup of physical memory page tables.

The graphic processing unit 2 comprises a graphic processing unit core, a computing fast link interface, a device-side consistency engine, a device-side memory management unit, a device-side page table buffer and a computing fast link command register; the device-side consistency engine, the device-side memory management unit and the device-side page table buffer are all connected with the recorder, and the graphic processing unit core is connected with the graphic processing unit core. The device-side coherency engine (Device Coherence Engine, DCOH) in the graphics processor unit 2 is responsible for managing cache coherency of system memory with the host agent of the central processor; the calculation fast link command register is responsible for receiving device execution commands (such as starting calculation) at the host end, such as synchronizing the graphics processing unit accessing the cold and hot degree of the host memory; the equipment-side memory management unit is responsible for managing the mapping from the equipment-side virtual address to the physical address; the equipment end page table buffer is responsible for quickly searching a physical memory page table of the equipment end; the recorder for recording the cold and hot degree of the physical memory tracks the memory management unit at the equipment end, the page table buffer at the equipment end and the consistency engine at the equipment end, and is responsible for recording the cold and hot degree of the physical page accessed by the equipment, and comprises the physical memory on a host and the physical memory at the equipment end. In the embodiment of the present invention, the cold and hot pages are migrated according to the cold and hot degree of the memory pages, for example, small rectangles filled with oblique lines in fig. 1 represent hot pages, and unfilled small rectangles represent cold pages. It should be noted that the cold-hot page in the embodiment of the present invention is determined by the access frequency of the gpu, and is not performed around the cpu side.

FIG. 2 is a schematic diagram of a GPU accessing host side memory data by computing a fast link protocol according to an embodiment of the present invention, where as shown in FIG. 2, the GPU accessing host includes the following steps:

1) An accelerator computing core in the graphic processing unit sends a data reading request to a device-side consistency engine;

2) The device-side consistency engine sends a data reading request to the host agent;

3) The host agent sends a query to the central processor as to whether the data is in the central processor cache;

4) If the data is not in the CPU cache, acquiring the data from the host memory, and loading the data into the CPU cache; if the data is in the CPU cache, the CPU returns the data to the host agent;

5) Performing consistency state maintenance record;

6) The equipment acquires data;

7) The accelerator computing core receives the data.

As can be seen from fig. 2, the memory of the graphics processing unit is not involved in the process of accessing the host side data by the graphics processing unit, when the data requested by the graphics processing unit is not at the cache side of the central processor, the data needs to be read from the host memory to the central processor cache, and then the data is packaged into a calculation fast link data packet through the host agent and transmitted to the graphics processing unit, so that the data can be transmitted between the graphics processing unit and the host according to the granularity of the cache line (64B) without transmitting the whole page (4 KB), and the data interaction efficiency between the graphics processing unit and the host is improved. In addition, a direct benefit of using a computing fast link to interconnect the cpu and the gpu is that the gpu computing core may access host memory in a consistent manner, without having to go through the gpu memory, but instead directly into the cache of the accelerator computing core, reducing transmission overhead, facilitating computation of the gpu over existing PCIE ways, as shown by the solid arrows in fig. 1 for paths that use the computing fast link protocol to transmit data, and the dashed arrows in fig. 1 for paths that do not use the computing fast link protocol to transmit data. The path for transmitting data using the calculated fast link protocol reduces the number of steps for transmitting data by one compared to the path for transmitting data without using the calculated fast link protocol.

Fig. 3 is a flowchart of a method for processing a video memory of a graphics processing unit according to an embodiment of the present invention, as shown in fig. 3, where the method includes:

s10: acquiring a memory request required for processing the artificial intelligent model;

wherein the memory request comprises a request sent to the video memory of the graphic processing unit and/or a request sent to the memory of the host;

s11: under the condition that the video memory capacity of the graphic processing unit is detected to not meet the preset requirement, the access frequency of each physical page recorded by the recorder is obtained;

The physical page comprises a memory page of a host accessed by a graphic processing unit recorded in the equipment end consistency engine, an equipment end physical memory page recorded by an equipment end memory management unit and a physical page searched by an equipment end page table buffer;

s12: determining a first physical page in a video memory of the graphic processing unit according to the access frequency of each physical page recorded by the recorder; the first physical page is a physical page with the access frequency smaller than the preset times;

s13: and moving the first physical page in the video memory of the graphic processing unit to the memory of the host computer through the calculation fast link protocol so as to complete the response to the memory request.

In the context of AI model training, the existing schemes for exchanging data between the graphics processing unit video memory and the central processor memory are all based on PCIE interconnect protocols, which are relatively inefficient, and the AI model training process may be described by a directed acyclic graph based on data flow (dataflow), where each point in the directed acyclic graph represents an operator operation, and each operator has input and output of tensor data. The current strategy mainly surrounds an AI model training layer, designs and optimizes from top to bottom, firstly exchanges tensor data which is temporarily unused in the AI model training process into a CPU memory, and exchanges tensor which is about to be used into a graphic processing unit memory from the CPU memory, so that data which needs to be operated in a near-term needs to be loaded into a graphic processing unit in advance due to frequent communication between the CPU and the graphic processing unit. However, this strategy is specific to a certain AI model, and may be different for different AI models, and lacks versatility, so a manner unrelated to the training of the upper AI model is needed to indirectly improve the video memory of the graphics processing unit, so that the method for improving the video memory of the graphics processing unit has versatility.

Specifically, memory requests required for processing the artificial intelligence model are obtained. The memory request includes a request from the graphics processing unit to the local memory and/or a request from the memory of the host. And then detecting whether the video memory capacity of the graphic processing unit does not meet the preset requirement. The video memory capacity of the graphic processing unit does not meet the preset requirement to represent the insufficient video memory space of the graphic processing unit. In order to solve the problem of capacity limitation of the graphics processing unit, in this embodiment, the first physical page in the graphics processing unit (i.e. the physical page with the access frequency smaller than the preset number of times) is determined according to the access frequency of the physical page recorded in the graphics processing unit. The preset times are not limited and are determined according to actual conditions. It should be noted that in the embodiment of the present invention, the physical page is quantitatively determined to be a cold page or a hot page according to the access frequency of the physical page. The cold and hot degree of the physical page can be defined as multiple states, but in the embodiment of the invention, in order to reduce the space for storing the state of the page, the cold and hot degree of the page is defined as two states, such as a cold page and a hot page. The cold page is a page which is accessed less frequently than k in the latest period of time, and the hot page is a page which is accessed more frequently than k in the latest period of time.

After determining a first physical page in a video memory of the graphic processing unit, moving the first physical page to a memory of a host through a computing fast link protocol so as to complete response to the memory request.

Under the condition that the graphic processing unit is detected to finish processing the artificial intelligent model, acquiring a command sent by a host computer and used for representing and acquiring the processing result of the artificial intelligent model through a calculation quick link command register; the request content corresponding to the command for representing and acquiring the artificial intelligent model processing result comprises a request part of the artificial intelligent model processing result or a request whole of the artificial intelligent model processing result;

And transmitting part or all of the artificial intelligent model processing results to the host computer through a memory protocol (CXL.mem) in the computing fast link protocol according to the command for representing and acquiring the artificial intelligent model processing results.

The method carries out artificial intelligent model processing in the graphic processing unit, thereby relieving the pressure of a host; meanwhile, a certain time delay expenditure exists in a direct memory access (Direct Memory Access, DMA) mode, all data are fetched each time, and the method and the device adopt a calculation fast link protocol to transmit part or all of the artificial intelligent model processing results to a host, and can keep consistency between the memory space of a central processor and the memory on additional equipment (a graphic processing unit) through a calculation fast link technology, allow resources to be shared so as to obtain higher performance, reduce the complexity of a software stack, reduce the cost of the whole system and enable the host to process the results according to the artificial intelligent model.

In the method provided by the embodiment of the invention, firstly, when the memory request required by the artificial intelligent model is processed, under the condition that the video memory of the graphic processing unit does not meet the required memory request, the physical pages with the access frequency smaller than the preset times in the video memory of the graphic processing unit are migrated to the host memory, so that the problem of limited capacity of the graphic processing unit is solved, and the availability and the utilization rate of the graphic processing unit are improved. Secondly, the graphic processing unit and the host computer both comprise computing fast link interfaces, the graphic processing unit comprises a recorder which is respectively connected with the equipment end consistency engine, the equipment end memory management unit and the equipment end page table buffer, and physical pages which can be recorded by the recorder comprise memory pages of the host computer accessed by the graphic processing unit recorded in the equipment end consistency engine, equipment end physical memory pages recorded by the equipment end memory management unit and searched physical pages recorded by the equipment end page table buffer, namely, the access frequency of the physical pages of the graphic processing unit and the access frequency of the physical pages of the host computer are recorded in the recorder, so that the access condition of each physical page can be accurately determined according to the access frequency, and further, the first physical page positioned in the display memory of the graphic processing unit can be accurately moved to the memory of the host computer; thirdly, as the physical page is moved between the host and the graphic processing unit by calculating a fast link protocol and based on the condition of access frequency, rather than a mode of loading data which is needed by a model recently into a video memory of the graphic processing unit, the video memory of the graphic processing unit is exchanged with upper-layer non-perceived memory scheduling of a host memory, the universality of a video memory processing method of the graphic processing unit is improved, and the method is suitable for processing requirements of different AI models; furthermore, the graphic processing unit and the host interact through calculating a fast link protocol, so that the scale and the training efficiency of AI model training are improved compared with the mode of interaction through adopting a PCIE interconnection protocol; and, training of the model is performed in the graphic processing unit, so that the pressure of the host is relieved.

In order to reduce the memory of the occupied gpu, in an implementation, recording, by the logger, the frequency of accesses to each physical page includes:

storing the access frequency of each physical page in a recorder according to a bitmap data structure;

Storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to a bitmap data structure through a first protocol; the first protocol is a cache protocol in a computing fast link protocol.

The central processing unit and the graphic processing unit are interconnected by a computing fast link technology, so that the graphic processing unit can access the memory of the host side through a cache (CXL.cache) protocol interface in the computing fast link protocol, and when the graphic processing unit accesses the memory of the host, the recorder can record the access frequency of the central processing unit memory page used by the graphic processing unit from the device side consistency engine side because the device side consistency engine can manage the consistency of the central processing unit core and the graphic processing unit core for accessing the central processing unit memory, and meanwhile, the recorder can obtain the access frequency of the device side physical memory page from the memory management unit of the device side. The recorder runs in the background with cold and hot sensing of the memory pages, and the application programs on the graphic processing unit are not sensed. Fig. 4 is a schematic diagram of interaction of a paging component according to an embodiment of the present invention. As shown in fig. 4, when the host interacts with the graphics processing unit, a page scheduler (located in the device driver) and an accelerator cold and hot page recording unit (which may be located in the host memory) are disposed on the host; the graphic processing unit is provided with a calculation quick link command register connected with the equipment memory and a recorder for recording the cold and hot degree of the physical page; the paging unit in the host interacts with the calculating fast link command register in the graphics processing unit through a command protocol in the calculating fast link protocol; the page scheduler in the host interacts with the recorder in the graphic processing unit by calculating a cache protocol in the fast link protocol; an accelerator cold and hot page recording unit located in the host interacts with a recorder located in the graphics processing unit by calculating a caching protocol in the fast link protocol. Storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure through a first protocol comprises:

Receiving a command sent by the host for characterizing the access frequency of each physical page in the memory of the synchronous host by using a second protocol and by calculating a fast link command register; wherein the second protocol is a command protocol in a computing fast link protocol;

According to the command received by the calculation quick link command register, the data of the access frequency of each physical page recorded by the recorder is stored in the memory of the host according to the bitmap data structure through the first protocol.

In order to reduce the memory consumption on the gpu, as shown in fig. 4, the access frequency of the recorder to the page is recorded in the memory of the host, and the synchronization of the cold and hot change information of the memory page can be controlled in real time by calculating the command protocol in the fast link protocol, and the data of the cold and hot degree of the page is responsible for transmission by calculating the cache protocol in the fast link protocol. The method has the advantages that on one hand, the cheaper high-capacity memory of the host can be fully utilized, on the other hand, the migration of the cold and hot pages is completed by the device driver of the host side, and the data structure of the cold and hot degree record of the pages is stored on the memory side of the host, so that the scheduling decision can be more effectively made.

The degree of coldness of a page is defined as how often a page is accessed by the graphics processing unit computing core for a period of time, and to reduce memory space, the record is stored using a bitmap (bitmap) data structure, each page's degree of coldness defines only two states (multiple states means that more space is required to store these states, and overall performance does not necessarily lead to a greater boost than would be the case for two states): "Cold" and "hot" may be stored with 1 bit.

In order to avoid the performance jitter generated by the access data switching as much as possible and prevent the performance ping-pong phenomenon, in implementation, the method for processing the video memory of the graphic processing unit further comprises:

Under the condition that the video memory capacity of the graphic processing unit meets the preset requirement, determining a second physical page of the host according to the frequency of the graphic processing unit accessing the memory page of the host recorded in the recorder; the second physical page is a physical page with the access frequency being greater than or equal to the preset times;

In order to prevent the running out of the cpu memory, it is important to timely migrate the cold page to the cpu memory, and in order to prevent the ping-pong phenomenon of performance (performance jitter due to access data switching in the fast and slow memories), the gpu memory will not run out of the cpu memory, and a part of space is reserved for the hot data to be migrated in the cpu memory, so that the hot data can be migrated directly to the gpu memory later. The current 64-bit address space completely supports unified addressing of physical memory pages on a host and a graphics processing unit, and the unified addressing process is completed when the graphics processing unit is powered on, so that a module (such as a recorder) on the graphics processing unit can judge whether the physical page address of data in a cache line belongs to the host or the graphics processing unit, and a recording unit (recorder) of cold and hot data in the graphics processing unit can record better. As shown in fig. 4, the recorder can synchronize hot and cold data of some pages to a bitmap data structure in a host memory by calculating a cache protocol in a fast link protocol, and send the cache protocol data through a device-side consistency engine, so as to actively write page access condition records into the memory of the host side.

In order to improve the transmission efficiency of the protocol, before the first physical page in the video memory of the graphics processing unit is moved to the memory of the host by calculating the fast link protocol, and/or before the second physical page of the host is moved to the video memory of the graphics processing unit by the first protocol, the method further comprises:

Acquiring a first request and a second request sent to a host; the first request is a request for representing accessing the memory of the host, and the second request is a request for representing accessing frequency data of each physical page recorded in the synchronous recorder to the memory of the host;

Fig. 5 is a schematic diagram of a request multiplexing interface according to an embodiment of the present invention, as shown in fig. 5, a Multiplexing (MUX) component packages a cache protocol access host memory request in a computation fast link protocol and a cache protocol synchronous hot and cold data request in the computation fast link protocol together, and sends the packaged requests to the computation fast link interface.

According to the method, the request for accessing the memory of the host and the request for representing the data of the access frequency of each physical page recorded in the synchronous recorder to the memory of the host are packaged and sent to the computing fast link interface, so that the efficiency of protocol transmission is improved.

The method for processing the video memory of the graphic processing unit further comprises the following steps: and under the condition that the video memory of the graphic processing unit does not contain the first physical page according to the access frequency of each physical page recorded by the recorder, accessing the memory of the host through a first protocol, and accessing the memory page of the host through the graphic processing unit recorded in the equipment-side consistency engine.

In an implementation, moving a first physical page located in a memory of a graphics processing unit to a memory of a host by computing a fast link protocol includes:

Obtaining the residual video memory capacity of the graphic processing unit after a preset number of first physical pages are to be migrated from all first physical pages in the video memory of the graphic processing unit; the preset number is smaller than the number of all first physical pages in the video memory of the graphic processing unit;

judging whether the residual video memory capacity meets the preset requirement;

if not, returning to obtain the residual memory capacity of the graphic processing unit after the preset number of first physical pages are to be migrated from all the first physical pages in the memory of the graphic processing unit;

if yes, each first physical page to be migrated is moved to the memory of the host through a first protocol.

In order to avoid the influence of the cold data moving for a plurality of times on the performance of the graphics processing unit, in implementation, moving each first physical page to be migrated to the memory of the host through the first protocol includes:

And after judging that the residual capacity meets the preset requirement, simultaneously moving each first physical page to be migrated to the memory of the host through a first protocol according to the recorded information of each first physical page to be migrated.

Determining a preset number of first physical pages to be migrated from all first physical pages in a video memory of a graphics processing unit comprises:

Acquiring the priority order of all first physical pages in a video memory of a graphic processing unit;

selecting a preset number of first physical pages from all the first physical pages in the video memory of the graphic processing unit according to the priority order of the first physical pages;

And taking the preset number of first physical pages as the first physical pages to be migrated.

The preset number and the priority order of the first physical pages are not limited, and are determined according to actual conditions. The method comprises the steps of obtaining the residual video memory capacity of the graphic processing unit after all first physical pages to be migrated in the video memory of the graphic processing unit are calculated, determining the number of the first physical pages to be migrated according to the residual video memory capacity, and compared with the method of moving all the first physical pages to a host, reducing the data quantity to be migrated and improving the performance of the graphic processing unit; and when the first physical page is moved, the first physical page is uniformly moved to the host, so that the influence of multiple times of movement on the performance of the graphic processing unit is avoided.

In order to facilitate the subsequent continuous movement of the physical pages according to the access frequency of the physical pages, and ensure the availability of the graphic processing unit, after the first physical page in the video memory of the graphic processing unit is moved to the memory of the host computer through the computing fast link protocol, the method further comprises:

Accessing the memory of the host by a direct memory access mode of a second protocol under the condition that the access of a first physical page in the memory migrated from the graphic processing unit to the host is detected; wherein the second protocol is a command protocol in a computing fast link protocol.

After accessing the host memory by the direct memory access mode of the second protocol, the method further comprises:

recording the access frequency of a first physical page in a memory migrated from the graphic processing unit to the host through the recorder, and detecting whether the video memory capacity of the graphic processing unit meets the preset requirement;

if not, returning to the step of determining the second physical page of the host according to the frequency of the graphic processing unit accessing the memory page of the host recorded in the recorder.

After a first physical page in a video memory of a graphic processing unit is moved to a memory of a host through a computing fast link protocol, the method records the access frequency of the first physical page in the memory migrated from the graphic processing unit to the host and records the memory access frequency of the host, thereby facilitating the subsequent continuous movement of the physical page according to the access frequency of the physical page and ensuring the usability of the graphic processing unit.

It should be noted that, the optimization performed to some extent for a certain operator is different from the execution of the method of the present invention for counting the use heat of the gpu program for a physical page and then scheduling, the page can only be used by one of the cpu and the gpu, if the cpu memory is frequently accessed by the gpu, the cpu memory page needs to be imported into the gpu memory in time, if some page access heat in the gpu memory is reduced, it is timely swapped out into the cpu memory, so that the method has better versatility and lower layer and higher performance compared with the optimization for a certain AI model.

In order to provide a better understanding of the present invention, the following detailed description of the overall scheme of the present invention is given with reference to the accompanying drawings and detailed description. FIG. 6 is a flowchart of a method for requesting memory on a graphics processing unit according to an embodiment of the present invention, as shown in FIG. 6, the method includes:

s14: acquiring a memory request on a graphic processing unit;

S15: judging whether enough graphic processing unit video memory exists or not; if not, the step S16 is carried out; if yes, go to step S25;

s16: selecting a part of cold memory to be migrated to the CPU memory;

S17: judging whether a cold memory can be migrated; if yes, go to step S18; if not, go to step S24;

S18: judging whether the memory of the graphic processing unit is enough after migration; if yes, go to step S19; if not, returning to the step S16;

S19: migrating the 'cold' memory to the central processor memory through a scheduler;

S20: recording the frequency of memory access on the graphic processing unit;

s21: accessing the host memory by calculating a direct memory access mode of a command protocol of a fast link protocol;

s22: recording the memory access frequency of the host through a recorder;

s23: the memory request is completed;

s24: accessing the host memory by calculating a cache protocol of the fast link protocol; returning to step S22;

s25: judging whether a 'hot' host memory can be migrated to a graphic processing unit memory or not; if yes, go to step S26;

s26: the "hot" host memory is migrated to the graphics processing unit memory by the scheduler.

It should be noted that, in step S16, in order to select a portion of the "cold" memory to be migrated to the cpu memory, the "cold" memory may be determined by querying the cold and hot page records of the gpu; in step S25, in order to determine whether there is a "hot" host memory that can be migrated to the gpu memory, the accelerator hot and cold page records can be queried by communicating a command protocol to the paging unit by calculating a fast link protocol. In step S20, the frequency of memory access on the gpu is recorded, specifically, the recorder records and synchronizes the cold and hot pages of the accelerator on the host side in time. If the step S15 is performed to determine whether there is enough graphics processing unit memory, it is determined that there is enough graphics processing unit memory, or the memory request may be directly determined to be completed.

The application program at the host end is compiled into an executable program (command sequence) after passing through a compiler and then is sent to an accelerator graphic processing unit, the command sequence is executed on the graphic processing unit, data required by the program is transmitted into the graphic processing unit memory in advance, the memory space address of the program execution is a virtual address, firstly, the program is required to be converted into a physical address space through a memory management unit on the graphic processing unit, the equipment end also is required to be configured with a host page table buffer fast table structure for rapid virtual and real address conversion, a memory allocation request can be met in the program execution stage, when the graphic processing unit memory is insufficient, the host memory can be accessed through the cache protocol consistency in a fast link protocol due to unified addressing of the physical memory, and the recorder records the cold and hot information of each physical page and timely synchronizes the information through a fast link protocol interface. The logic of the memory request on the overall gpu is shown in fig. 6, and a flow chart of the memory hot and cold data scheduling on the gpu is described in fig. 6, firstly, an application program on the gpu will initiate a memory request, the memory management unit will determine whether there are enough gpu memory, if yes, the memory request is completed, and if there is "hot" memory in the bitmap host side memory, if yes, the memory is migrated to the gpu memory in time. If the graphics processing unit does not have enough memory, it is considered to migrate part of the "cold" graphics processing unit memory to host memory, and if the "cold" graphics processing unit memory does not exist, then the host memory is accessed directly through the cache protocol in the fast link protocol, while recording the access frequency of the host memory. If the cold graphic processing unit video memory exists, firstly transferring a part of the cold graphic processing unit video memory to the host memory, and judging whether the current memory request is met, if not, continuing to try to transfer the cold graphic processing unit video memory until the current memory request is met, and when the cold graphic processing unit video memory is transferred to the host memory, maintaining and recording the frequency of the memory page accessed by the graphic processing unit for later transfer back to the graphic processing unit video memory by the dispatcher.

In the method for improving the utilization rate of the video memory of the graphic processing unit provided by the embodiment of the invention, the host and the graphic processing unit can carry out high-efficiency communication through the fast link protocol by the graphic processing unit architecture design supporting the fast link protocol; secondly, after the unified addressing of the system physical memory is realized by the fast link protocol technology, tracking the accessed physical memory pages by the graphic processing unit, and recording the frequency of the access of the pages on the host memory and the pages on the display memory of the graphic processing unit; and finally, according to the record of the access frequency, the corresponding cold and hot pages are exchanged in time through a scheduling strategy of cold and hot page exchange, so that the memory pages on the graphic processing unit are always accessed frequently.

In the above embodiments, the present invention further provides embodiments of the graphics processing unit video memory processing device and the server corresponding to the graphics processing unit video memory processing method. It should be noted that the present invention describes an embodiment of the device portion from two angles, one based on the angle of the functional module and the other based on the angle of the hardware.

An embodiment of the invention provides a video memory processing device of a graphics processing unit. The embodiment is based on the angle of the functional module, and comprises:

The first acquisition module is used for acquiring a memory request required by processing the artificial intelligent model;

The second acquisition module is used for acquiring the access frequency of each physical page recorded by the recorder under the condition that the video memory capacity of the graphic processing unit is detected to not meet the preset requirement; the physical page comprises a memory page of a host accessed by a graphic processing unit recorded in the equipment end consistency engine, an equipment end physical memory page recorded by an equipment end memory management unit and a physical page searched by an equipment end page table buffer;

the first determining module is used for determining a first physical page in the video memory of the graphic processing unit according to the access frequency of each physical page recorded by the recorder; the first physical page is a physical page with the access frequency smaller than the preset times;

And the moving module is used for moving the first physical page positioned in the video memory of the graphic processing unit to the memory of the host computer through the computing fast link protocol so as to complete the response to the memory request.

In some embodiments, the graphic processing unit video memory processing device includes a first recording module, configured to record, by using a recorder, access frequency of each physical page;

the first recording module is specifically used for storing the access frequency of each physical page in the recorder according to the bitmap data structure;

Further comprises: the storage module is used for storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure through a first protocol; the first protocol is a cache protocol in a computing fast link protocol.

In some embodiments, the graphics processing unit video memory processing device further comprises:

the second determining module is specifically configured to determine a second physical page of the host according to the frequency of accessing the memory page of the host by the graphics processing unit recorded in the recorder when it is detected that the video memory capacity of the graphics processing unit meets a preset requirement; the second physical page is a physical page with the access frequency being greater than or equal to the preset times;

and the migration module is used for migrating the second physical page of the host to the video memory of the graphic processing unit through the first protocol.

The third acquisition module is used for acquiring a first request and a second request sent to the host; the first request is a request for representing accessing the memory of the host, and the second request is a request for representing accessing frequency data of each physical page recorded in the synchronous recorder to the memory of the host;

and the sending module is used for packaging and sending the first request and the second request to a computing quick link interface in the graphic processing unit.

And the access and recording module is used for accessing the memory of the host through the first protocol and accessing the memory page of the host through the graphics processing unit recorded in the equipment end consistency engine under the condition that the video memory of the graphics processing unit does not contain the first physical page according to the access frequency of each physical page recorded by the recorder.

In some embodiments, the mobile module specifically includes:

A fourth obtaining module, configured to obtain a remaining memory capacity of the graphics processing unit after a preset number of first physical pages are to be migrated from all the first physical pages in the memory of the graphics processing unit; the preset number is smaller than the number of all first physical pages in the video memory of the graphic processing unit;

The first judging module is used for judging whether the residual video memory capacity meets the preset requirement; if not, triggering a fourth acquisition module; if yes, triggering a first moving submodule;

And the first migration submodule is used for migrating each first physical page to be migrated to the memory of the host through a first protocol.

In some embodiments, the first mobile submodule specifically includes:

the second recording module is used for recording the information of each first physical page to be migrated from the beginning of judging whether the residual video memory capacity meets the preset requirement or not until the residual video memory capacity meets the preset requirement;

And the second migration submodule is used for simultaneously migrating each first physical page to be migrated to the memory of the host through the first protocol according to the recorded information of each first physical page to be migrated after judging that the residual capacity meets the preset requirement.

In some embodiments, the graphics processing unit video memory processing device includes a third determining module, configured to determine a preset number of first physical pages to be migrated from all first physical pages in a video memory of the graphics processing unit;

The third determining module specifically includes:

a fifth acquisition module, configured to acquire a priority order of all the first physical pages in the video memory of the graphics processing unit;

The selection module is used for selecting a preset number of first physical pages from all the first physical pages in the video memory of the graphic processing unit according to the priority order of the first physical pages;

And the module is used for taking the preset number of first physical pages as the first physical pages to be migrated.

The access module is used for accessing the memory of the host through a direct storage access mode of the second protocol under the condition that the access of the first physical page in the memory migrated from the graphic processing unit to the host is detected; wherein the second protocol is a command protocol in a computing fast link protocol.

The second judging module is used for recording the access frequency of the first physical page in the memory migrated from the graphic processing unit to the host through the recorder and judging whether the video memory capacity of the graphic processing unit meets the preset requirement or not;

if yes, returning to trigger the second acquisition module;

if not, returning to trigger the second determining module.

In some embodiments, the storage module specifically includes:

The receiving module is used for receiving a command which is sent by the host and used for representing the access frequency of each physical page in the memory of the synchronous host by utilizing a second protocol and calculating a fast link command register; wherein the second protocol is a command protocol in a computing fast link protocol;

And the storage sub-module is used for storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure according to the command received by the calculation quick link command register and through the first protocol.

Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein. The effect is the same as above.

FIG. 7 is a block diagram of a GPU memory processing device according to another embodiment of the present invention. The embodiment is based on hardware angle, as shown in fig. 7, and the graphics processing unit video memory processing device includes:

A memory 20 for storing a computer program;

A processor 21 for implementing the steps of the graphics processing unit memory processing method as mentioned in the above embodiments when executing a computer program.

Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The Processor 21 may be implemented in at least one hardware form of a digital signal Processor (DIGITAL SIGNAL Processor, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 21 may also include a main processor, which is a processor for processing data in an awake state, also called CPU, and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a GPU for taking care of rendering and drawing of the content that the display screen is required to display. In some embodiments, the processor 21 may also include an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) processor for processing computing operations related to machine learning.

Memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing a computer program 201, where the computer program is loaded and executed by the processor 21 to implement the relevant steps of the video memory processing method of the gpu disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may further include an operating system 202, data 203, and the like, where the storage manner may be transient storage or permanent storage. Operating system 202 may include Windows, unix, linux, among other things. The data 203 may include, but is not limited to, the data related to the above-mentioned gpu memory processing method, and the like.

In some embodiments, the gpu video memory processing device may further include a display 22, an input/output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the graphics processing unit memory processing device and may include more or less components than those shown.

The graphic processing unit video memory processing device provided by the embodiment of the invention comprises a memory and a processor, wherein the processor can realize the following method when executing a program stored in the memory: the graphic processing unit displays the processing method and the effect is the same.

The embodiment also provides a server, which comprises a host and a graphic processing unit, wherein the graphic processing unit is connected with the host, the graphic processing unit and the host both comprise a computing fast link interface, and the graphic processing unit comprises a recorder which is respectively connected with an equipment end consistency engine, an equipment end memory management unit and an equipment end page table buffer;

The graphic processing unit is used for acquiring a memory request required by processing the artificial intelligent model; wherein the memory request comprises a request sent to the video memory of the graphic processing unit and/or a request sent to the memory of the host; under the condition that the video memory capacity of the graphic processing unit is detected to not meet the preset requirement, the access frequency of each physical page recorded by the recorder is obtained; the physical page comprises a memory page of a host accessed by a graphic processing unit recorded in the equipment end consistency engine, an equipment end physical memory page recorded by an equipment end memory management unit and a physical page searched by an equipment end page table buffer; determining a first physical page in a video memory of the graphic processing unit according to the access frequency of each physical page recorded by the recorder, wherein the first physical page is a physical page with the access frequency smaller than the preset frequency; and moving the first physical page in the video memory of the graphic processing unit to the memory of the host computer through the calculation fast link protocol so as to complete the response to the memory request.

In some embodiments, the graphics processing unit is further configured to send a data read request to the device-side coherence engine through the accelerator computing core; transmitting a data reading request to a host agent through a device-side consistency engine; receiving data sent by a host agent through a device-side consistency engine, and sending the data sent by the host agent to an accelerator computing core;

Wherein the host agent sending data to the device-side coherence engine comprises: the host agent sends inquiry data to the central processing unit whether the data are in the central processing unit cache or not, if so, the host agent receives the data in the central processing unit cache and sends the data in the central processing unit cache to the equipment end consistency engine; if not, after the data in the host memory is loaded into the central processor cache, the host agent receives the data in the central processor cache and sends the data in the central processor cache to the equipment-side consistency engine.

In some embodiments, the graphics processing unit further includes a calculate fast link command register; the host comprises a page scheduler; the computing quick link command register is respectively connected with the recorder and the page scheduler;

Under the condition that a command for representing the access frequency of the physical page is received from the host, the command register for calculating the fast link sends the command for representing the access frequency of the physical page to the recorder so as to facilitate the recorder to record the access frequency of the physical page; and/or under the condition that a command for representing paging is received, wherein the command is sent by the host, the command is sent to the paging scheduler through a command protocol in the calculation fast link protocol, so that the paging scheduler can conveniently conduct paging.

The server provided in this embodiment has the same or corresponding technical features as the graphics processing unit video memory processing method described above, and the embodiments of the graphics processing unit video memory processing method have been described in detail above, so that the embodiments of the server are not repeated herein, and the effects are the same as above.

The present embodiment also provides a computer program product, which includes a computer program/instruction, where the computer program/instruction implements the steps of the method for processing a video memory of a graphics processing unit when the computer program/instruction is executed by a processor. The effect is the same as above.

Finally, the invention also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps as described in the method embodiments above.

It will be appreciated that the methods of the above embodiments, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored on a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The computer readable storage medium provided by the invention comprises the method for processing the video memory of the graphic processing unit, and the effects are the same as the above.

The graphics processing unit video memory processing method, the server, the product, the equipment and the medium provided by the invention are described in detail above. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. The method is characterized in that the method is applied to a graphic processing unit, the graphic processing unit is connected with a host, the graphic processing unit and the host both comprise computing fast link interfaces, and the graphic processing unit comprises a recorder which is respectively connected with an equipment-side consistency engine, an equipment-side memory management unit and an equipment-side page table buffer; the method comprises the following steps:

Acquiring a memory request required for processing the artificial intelligent model;

Moving a first physical page in a video memory of the graphic processing unit to a memory of the host through a computing fast link protocol so as to complete response to a memory request;

the graphic processing unit also comprises a calculation quick link command register;

and the calculation fast link command register sends a command for representing the access frequency of the physical page to the recorder under the condition of receiving the command for representing the access frequency of the physical page, which is sent by the host, so that the recorder can record the access frequency of the physical page conveniently.

2. The method for processing a video memory of a graphic processing unit according to claim 1, wherein recording the access frequency of each physical page by the recorder comprises:

3. The method for processing a video memory of a graphics processing unit according to claim 2, further comprising:

4. A method according to claim 3, wherein before said moving a first physical page located in a memory of said graphics processing unit to a memory of said host by computing a fast link protocol and/or before said moving a second physical page of said host to a memory of said graphics processing unit by said first protocol, further comprising:

5. The method for processing a video memory of a graphics processing unit according to claim 3 or 4, further comprising:

6. The method of claim 5, wherein moving the first physical page located in the video memory of the gpu to the memory of the host by computing a fast link protocol comprises:

7. The method of claim 6, wherein moving each first physical page to be migrated to the memory of the host via the first protocol comprises:

8. The method of claim 6, wherein determining the predetermined number of first physical pages to be migrated from all first physical pages in the video memory of the gpu comprises:

9. The method of claim 3, further comprising, after said moving a first physical page located in a memory of said gpu to a memory of said host by computing a fast link protocol:

10. The method for processing a video memory of a graphics processing unit according to claim 9, further comprising, after said accessing the host memory by means of a direct memory access of the second protocol:

11. The method for processing a video memory of a graphics processing unit according to claim 2, wherein storing the data of the access frequency of each physical page recorded by the recorder in the memory of the host according to the bitmap data structure by the first protocol comprises:

12. The server is characterized by comprising a host and a graphic processing unit, wherein the graphic processing unit is connected with the host, the graphic processing unit and the host both comprise a computing fast link interface, and the graphic processing unit comprises a recorder which is respectively connected with an equipment-side consistency engine, an equipment-side memory management unit and an equipment-side page table buffer;

The graphic processing unit is used for acquiring a memory request required by processing the artificial intelligent model; wherein the memory request comprises a request sent to a video memory of the graphic processing unit and/or a request sent to a memory of the host; under the condition that the video memory capacity of the graphic processing unit does not meet the preset requirement, acquiring the access frequency of each physical page recorded by the recorder; the physical page comprises a memory page of the host accessed by the graphic processing unit recorded in the equipment-side consistency engine, an equipment-side physical memory page recorded by the equipment-side memory management unit and a physical page searched by the equipment-side page table buffer; determining a first physical page in a video memory of the graphic processing unit according to the access frequency of each physical page recorded by the recorder, wherein the first physical page is a physical page with the access frequency smaller than the preset times; moving a first physical page in a video memory of the graphic processing unit to a memory of the host through a computing fast link protocol so as to complete response to a memory request;

13. The server of claim 12, wherein the graphics processing unit is further configured to send a data read request to a device side consistency engine through an accelerator computing core; transmitting a data reading request to a host agent through a device-side consistency engine; receiving data sent by the host agent through a device-side consistency engine, and sending the data sent by the host agent to an accelerator computing core;

14. The server of claim 12, wherein the host includes a paging unit; the computing fast link command register is respectively connected with the recorder and the page scheduler;

and under the condition that a command for representing paging is received, which is sent by the host, sending an instruction to the paging scheduler through a command protocol in the computing fast link protocol so as to facilitate the paging scheduler to conduct paging.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the graphics processing unit memory processing method of any one of claims 1 to 11.

16. A graphics processing unit video memory processing apparatus, comprising:

A memory for storing a computer program;

a processor for implementing the steps of the graphics processing unit video memory processing method according to any one of claims 1 to 11 when executing the computer program.

17. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when executed by a processor, the computer program implements the steps of the graphics processing unit video memory processing method according to any one of claims 1 to 11.