CN117331679A

CN117331679A - Data reasoning method, device, equipment and storage medium

Info

Publication number: CN117331679A
Application number: CN202210719432.2A
Authority: CN
Inventors: 李愈曈; 卢凯敏
Original assignee: 3600 Technology Group Co ltd
Current assignee: 3600 Technology Group Co ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2024-01-02

Abstract

The invention belongs to the technical field of computers, and discloses a data reasoning method, a device, equipment and a storage medium. When a data reasoning request is received, the embodiment of the invention distributes corresponding CPU processing threads for the data reasoning request; carrying out feature processing on the data reasoning request through a CPU processing thread, and constructing a corresponding data reasoning task according to a feature processing result when the processing is completed; detecting whether an idle GPU thread exists; if the idle GPU threads exist, the corresponding GPU threads are distributed for the data reasoning task, and the data reasoning task is executed through the GPU threads. Because the CPU processing thread in the equipment performs complex feature processing without complex encapsulation, the request parameters are reduced by several orders of magnitude, the network transmission pressure is relieved, and the GPU thread performs data reasoning based on the feature processing result, so that the feature processing and the data reasoning are isolated, the GPU utilization rate is ensured, and the overall performance is improved.

Description

Data reasoning method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data reasoning method, apparatus, device, and storage medium.

Background

With the wide application of deep learning in the advertising and recommendation fields, the requirement of online service on performance is increasing, and the computational power required by the deep learning model is far beyond the limit of the computational power of a central processing unit (central processing unit, CPU), so that the overall performance of the data reasoning service is low. On this basis, other devices are required to cooperatively operate, and a real-time reasoning service which can be run on a graphics processor (graphics processing unit, GPU) becomes an essential component of the advertising and recommendation system.

The current mainstream GPU reasoning service can accept the request with the model input as the parameter and return the result of the model reasoning, but the feature processing stage is generally lacking or only simple feature processing can be performed. The depth map model often uses a sub-sampling technology, the input of the model is huge, if the data is transmitted into the GPU reasoning service after the feature processing is finished externally, the transmitted data volume is huge, and the network transmission becomes a performance bottleneck.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a data reasoning method, a device, equipment and a storage medium, and aims to solve the technical problem that the overall performance of a data reasoning service based on a depth map model is low in the prior art.

In order to achieve the above object, the present invention provides a data reasoning method, comprising the steps of:

when a data reasoning request is received, a corresponding CPU processing thread is allocated for the data reasoning request;

carrying out feature processing on the data reasoning request through the CPU processing thread, and constructing a corresponding data reasoning task according to a feature processing result when the processing is completed;

detecting whether an idle GPU thread exists;

if the idle GPU threads exist, corresponding GPU threads are distributed to the data reasoning task, and the data reasoning task is executed through the GPU threads.

Optionally, the step of allocating a corresponding CPU processing thread to the data reasoning request when the data reasoning request is received includes:

detecting whether idle threads exist in a preset CPU thread pool or not when a data reasoning request is received;

if the idle thread exists, extracting a CPU thread from the preset CPU thread pool, and taking the CPU thread as a CPU processing thread corresponding to the data reasoning request.

Optionally, after the step of detecting whether the idle thread exists in the preset CPU thread pool when the data reasoning request is received, the method further includes:

if no idle thread exists, detecting whether a queue element in a thread buffer queue corresponding to the preset CPU thread pool reaches an upper limit;

if the upper limit is not reached, the data reasoning request is added to the thread buffer queue.

Optionally, after the step of detecting whether the column-alignment element in the thread buffer queue corresponding to the preset CPU thread pool has reached the upper limit if there is no idle thread, the method further includes:

if the upper limit is reached, expanding the capacity of the preset CPU thread pool according to a preset capacity expansion rule, and synchronously expanding the capacity of the thread buffer queue when the capacity expansion is completed;

and when the synchronous capacity expansion of the thread buffer queue is completed, adding the data reasoning request into the thread buffer queue.

Optionally, after the step of detecting whether there is an idle GPU thread, the method further includes:

if no idle GPU thread exists, detecting whether a queue element in a preset GPU queue reaches an upper limit;

if the queue element in the preset GPU queue does not reach the upper limit, setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the preset GPU queue.

Optionally, the step of adding the data reasoning task to the preset GPU queue includes:

detecting whether the data reasoning task has a priority identification or not;

and if the priority identification does not exist, adding the data reasoning task to the head of the queue in the preset GPU queue.

Optionally, the step of detecting whether the data reasoning task has a priority identifier includes:

if the priority identifier exists, searching a priority level corresponding to the priority identifier;

determining a queue adding position according to the priority level and the task priority corresponding to each task in the preset GPU queue;

and adding the data reasoning task into the preset GPU queue according to the queue adding position.

Optionally, after the step of detecting whether the queue element in the preset GPU queue has reached the upper limit if there is no idle GPU thread, the method further includes:

if the queue element in the preset GPU queue reaches the upper limit, a buffer sub-queue is created for the preset GPU queue;

and setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the buffer sub-queue.

Optionally, if there is an idle GPU thread, allocating a corresponding GPU thread to the data reasoning task, and after executing the step of the data reasoning task by the GPU thread, further including:

when the data reasoning task is executed, reading a task reasoning result from the GPU thread;

and transmitting the task reasoning result into the CPU processing thread so that the CPU processing thread encapsulates the task reasoning result and feeds back the processing result to the request end of the data reasoning request.

Optionally, after the step of transmitting the task inference result to the CPU processing thread to enable the CPU processing thread to perform packaging processing on the task inference result and feed back the processing result to the request end of the data inference request, the method further includes:

when the request end of the data reasoning request successfully receives the processing result, thread data in the CPU processing thread is cleared;

and when the thread data is cleared, adding the CPU processing thread into a preset CPU thread pool.

Optionally, the step of allocating a corresponding GPU thread to the data reasoning task includes:

Acquiring all idle GPU threads, and determining the previous execution time corresponding to each idle GPU thread according to the historical processing log;

and sequencing the idle GPU threads from small to large according to the previous execution time, and taking the GPU thread sequenced first in the sequencing result as the GPU thread corresponding to the data reasoning task.

In addition, in order to achieve the above object, the present invention also provides a data reasoning apparatus, which includes the following modules:

the request processing module is used for distributing corresponding CPU processing threads for the data reasoning request when the data reasoning request is received;

the characteristic processing module is used for carrying out characteristic processing on the data reasoning request through the CPU processing thread, and constructing a corresponding data reasoning task according to a characteristic processing result when the processing is completed;

the thread detection module is used for detecting whether idle GPU threads exist or not;

and the thread allocation module is used for allocating the corresponding GPU threads for the data reasoning task if the idle GPU threads exist, and executing the data reasoning task through the GPU threads.

Optionally, the request processing module is further configured to detect whether an idle thread exists in a preset CPU thread pool when a data reasoning request is received; if the idle thread exists, extracting a CPU thread from the preset CPU thread pool, and taking the CPU thread as a CPU processing thread corresponding to the data reasoning request.

Optionally, the request processing module is further configured to detect whether a queue element in a thread buffer queue corresponding to the preset CPU thread pool has reached an upper limit if no idle thread exists; if the upper limit is not reached, the data reasoning request is added to the thread buffer queue.

Optionally, the request processing module is further configured to expand the preset CPU thread pool according to a preset expansion rule if the upper limit has been reached, and perform synchronous expansion on the thread buffer queue when expansion is completed; and when the synchronous capacity expansion of the thread buffer queue is completed, adding the data reasoning request into the thread buffer queue.

Optionally, the thread allocation module is further configured to detect whether a queue element in the preset GPU queue has reached an upper limit if there is no idle GPU thread; if the queue element in the preset GPU queue does not reach the upper limit, setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the preset GPU queue.

Optionally, the thread allocation module is further configured to detect whether a priority identifier exists in the data reasoning task; and if the priority identification does not exist, adding the data reasoning task to the head of the queue in the preset GPU queue.

Optionally, the thread allocation module is further configured to, if a priority identifier exists, search a priority level corresponding to the priority identifier; determining a queue adding position according to the priority level and the task priority corresponding to each task in the preset GPU queue; and adding the data reasoning task into the preset GPU queue according to the queue adding position.

In addition, in order to achieve the above object, the present invention also proposes a data reasoning apparatus including: a processor, a memory and a data reasoning program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data reasoning method as described above.

In addition, in order to achieve the above object, the present invention also proposes a computer-readable storage medium having stored thereon a data inference program which, when executed by a processor, implements the steps of the data inference method as described above.

When a data reasoning request is received, a corresponding CPU processing thread is allocated to the data reasoning request; carrying out feature processing on the data reasoning request through a CPU processing thread, and constructing a corresponding data reasoning task according to a feature processing result when the processing is completed; detecting whether an idle GPU thread exists; if the idle GPU threads exist, the corresponding GPU threads are distributed for the data reasoning task, and the data reasoning task is executed through the GPU threads. The CPU processing thread in the equipment performs complex feature processing, the internal processing does not need complex encapsulation, the request parameters are reduced by several orders of magnitude, the network transmission pressure is relieved, and then the GPU thread performs data reasoning based on the processing result of the feature processing, so that the feature processing and the data reasoning are isolated, the GPU utilization rate is ensured, and the overall performance is improved.

Drawings

FIG. 1 is a schematic diagram of an electronic device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flow chart of a first embodiment of the data reasoning method of the present invention;

FIG. 3 is a flow chart of a second embodiment of the data reasoning method of the present invention;

FIG. 4 is a flow chart of a third embodiment of the data reasoning method of the present invention;

FIG. 5 is a schematic diagram illustrating a data flow according to an embodiment of the data reasoning method of the present invention;

fig. 6 is a block diagram showing the construction of a first embodiment of the data inference apparatus of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a data inference device in a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the electronic device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the structure shown in fig. 1 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a data inference program may be included in the memory 1005 as one type of storage medium.

In the electronic device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the electronic device of the present invention may be provided in a data inference device, and the electronic device invokes a data inference program stored in the memory 1005 through the processor 1001 and executes the data inference method provided by the embodiment of the present invention.

An embodiment of the present invention provides a data reasoning method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the data reasoning method of the present invention.

In this embodiment, the data reasoning method includes the following steps:

step S10: and when the data reasoning request is received, distributing a corresponding CPU processing thread for the data reasoning request.

The execution body of the embodiment may be the data reasoning device, and the data reasoning device may be an electronic device such as a personal computer, a server, or other devices capable of implementing the same or similar functions, which is not limited in this embodiment, and in this embodiment and the following embodiments, the data reasoning method of the present invention is described by taking the data reasoning device as an example.

It should be noted that the data reasoning request may be a request sent to the data reasoning device by another server or user equipment, where the request may include data required for data reasoning during the recommendation or advertisement lookup process, for example: search conditions entered by the user, browser information used by the user, current login page of the user, city in which the user is located, access time of the user, and the like. The CPU processing thread may be a thread running on the CPU, and the CPU processing thread corresponding to the data reasoning request is a thread running on the CPU for processing data included in the data reasoning request.

In a specific implementation, the allocation of the corresponding CPU processing thread for the data reasoning request may be to create a thread running on the CPU as the CPU processing thread corresponding to the data reasoning request.

Step S20: and carrying out feature processing on the data reasoning request through the CPU processing thread, and constructing a corresponding data reasoning task according to a feature processing result when the processing is completed.

It should be noted that, the feature processing of the data reasoning request by the CPU processing thread may be transmitting the data reasoning request to the CPU processing thread, where the CPU processing thread extracts data included in the request from the data reasoning request, and then invokes a model or algorithm in the data reasoning device to perform feature processing (e.g. data conversion, feature extraction, data cleaning, etc.) on the extracted data, so as to generate input data required by the real-time reasoning service running on the GPU, thereby obtaining a feature processing result.

It can be understood that when the feature processing is completed, the CPU processes the input data required for the real-time reasoning service running on the GPU to make the reasoning in the thread, but it is not determined whether the real-time reasoning service on the GPU can immediately make the data reasoning at this time, and in order to avoid data loss, a data reasoning task can be constructed according to the feature processing result at this time.

It should be noted that, the data mainly stored in the data reasoning device is a graph structure of the depth map model, which generally includes nodes, characteristics of the nodes, neighbors of the nodes, side information and the like, so the data structure to be stored may be represented as: node→feature, node→neighbor set and node→edge set, i.e. three key-value structures are required.

Taking the c++ language as an example, there are two general types of key-value structures when they are stored:

1. map, the bottom layer is realized as a red black tree, the complexity of searching time is O (log n), and the space occupation comprises key, value and red black tree structure information;

2. the unorded_map bottom layer is realized as a hash table, the complexity of searching time in no collision is O (1), epsilon is increased by epsilon not less than 1 when collision occurs, and the space occupation comprises key, value and hash table structure information.

In order to maximize the search efficiency and minimize the storage space, the storage structure of the graph may be optimized as: node→index (index), index→feature, index→neighbor index set, and index→side information, i.e. only one key-value structure, three linear tables are needed.

The search time of three unorded_maps is 3 x (1+epsilon), the optimized search time is 4+epsilon, the optimized time consumption is less, the space occupation of the optimized space is increased by one index, the storage space of two nodes is reduced, the occupied space of the nodes is far greater than that of the index, and the space occupation of the optimized space is also less.

Step S30: it is detected whether there is an idle GPU thread.

It should be noted that, the GPU threads may be threads running on the GPU for performing data reasoning, where the GPU threads are bound with a specific GPU, that is, one GPU thread corresponds to one GPU, but one GPU may correspond to a plurality of GPU threads, and the number of GPU threads that the GPU may correspond to may be determined by the number of cores of the GPU and the number of threads supported by each core, for example: assuming that GPUs contained in the data reasoning device are single cores and only support double threads, at this time, 2 GPU threads corresponding to one GPU are provided.

It will be appreciated that, because it is uncertain whether the GPU currently has free resources for performing data reasoning tasks, it is necessary to detect whether the GPU has free resources, and at this time it is possible to detect whether there are free GPU threads, i.e. whether there are GPU threads currently not performing tasks.

In actual use, whether idle GPU threads exist can be detected by running a preset GPU detection script, wherein the preset GPU detection script can comprise detection commands for checking the state of the GPU, and the preset GPU detection script can be preset by a manager of the data reasoning equipment.

Step S40: if the idle GPU threads exist, corresponding GPU threads are distributed to the data reasoning task, and the data reasoning task is executed through the GPU threads.

It can be understood that if there is an idle GPU thread, it means that there is an idle GPU in the data inference device at the current time, so that the data inference task can be processed, and then a corresponding GPU thread can be allocated to the data inference task at this time, and then the data inference task is executed through the allocated GPU thread, so as to perform data inference.

Further, in order to ensure that the resources of each GPU are fully and reasonably utilized, the step of allocating the corresponding GPU thread for the data reasoning task in this embodiment may include:

It should be noted that the history processing log may be a log generated by the GPU thread when executing the data reasoning task, and the previous execution time may be a start time or an end time of executing the data reasoning task.

In practical use, if the device performance of the data reasoning device is higher, the device resources are rich, the data reasoning task can be processed rapidly, a plurality of idle GPU threads can appear, at this time, in order to reasonably utilize the resources of all the GPUs, the idle GPU threads can be sequenced from small to large at the previous execution time of each idle GPU thread according to the historical processing log, then the GPU thread sequenced first in the sequencing result is used as the GPU thread corresponding to the data reasoning task, so that each GPU thread can be guaranteed to process the data reasoning task sequentially, and the resources of each GPU can be fully and reasonably utilized.

Further, in order to facilitate the downstream task to perform the subsequent processing, after step S40 in this embodiment, the method may further include:

It should be noted that the request end may be a terminal that initiates the data reasoning request.

It can be understood that when the GPU thread finishes the task of data reasoning, a task reasoning result is generated, at this time, in order to facilitate the downstream task to perform subsequent processing based on the task reasoning result, the task reasoning result may be read from the GPU thread, and then the task reasoning result is transferred into a CPU processing thread that generates the task of data reasoning, so that the CPU processing thread processes and encapsulates the reasoning result, converts the reasoning result into a format in which the request end performs subsequent processing, and then feeds back the processing result to the request end corresponding to the data reasoning request, so that the request end performs subsequent processing.

In practical use, the data reasoning request can also include a data format required by subsequent processing, so that when the CPU processing thread receives the task reasoning result, the data format required by the subsequent processing can be extracted from the data reasoning request, and then the task reasoning result is packaged and processed based on the extracted data format, so that the processing result is generated.

Further, in order to avoid that a great deal of performance is wasted by repeatedly creating and destroying threads and that interference is caused by residual data in the threads to other subsequent data processing processes, the step of transmitting the task reasoning result to the CPU processing thread to enable the CPU processing thread to package the task reasoning result and feed back the processing result to the request end of the data reasoning request in this embodiment may further include:

It should be noted that the preset CPU thread pool may be a thread pool created in advance for storing CPU threads. Clearing the thread data in the CPU processing thread may be clearing a local variable in the CPU processing thread.

It can be understood that if the request end of the data reasoning request successfully receives the processing result, it indicates that the task of the CPU processing thread has been executed, and in general, the CPU processing thread may be destroyed, but, because the destruction of the thread consumes a large amount of equipment resources, in order to avoid wasting resources, the CPU processing thread may be multiplexed, so that a preset CPU thread pool may be created in advance for storing the CPU processing thread for which the task has been executed, and then when the data reasoning request is received, the CPU thread may be obtained from the preset CPU thread pool as the CPU processing thread corresponding to the data reasoning request.

In practical use, since a local variable generally exists in a thread, adding the thread to a thread pool gathers and does not remove the local variable of the thread, if the thread is directly multiplexed, the thread may still read the previously generated local variable when executing a subsequent task, and may confuse data, so before adding the CPU processing thread to a preset CPU thread pool, thread data in the CPU processing thread needs to be removed first, and after removing is completed, the CPU processing thread is added to the preset CPU thread pool, thereby avoiding data confusion.

In the embodiment, when a data reasoning request is received, a corresponding CPU processing thread is allocated for the data reasoning request; carrying out feature processing on the data reasoning request through a CPU processing thread, and constructing a corresponding data reasoning task according to a feature processing result when the processing is completed; detecting whether an idle GPU thread exists; if the idle GPU threads exist, the corresponding GPU threads are distributed for the data reasoning task, and the data reasoning task is executed through the GPU threads. The CPU processing thread in the equipment performs complex feature processing, the internal processing does not need complex encapsulation, the request parameters are reduced by several orders of magnitude, the network transmission pressure is relieved, and then the GPU thread performs data reasoning based on the processing result of the feature processing, so that the feature processing and the data reasoning are isolated, the GPU utilization rate is ensured, and the overall performance is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a data reasoning method according to the present invention.

Based on the above first embodiment, the step S10 of the data reasoning method of the present embodiment may include:

step S101: and detecting whether idle threads exist in a preset CPU thread pool or not when a data reasoning request is received.

It should be noted that, the preset CPU thread pool may be a thread pool storing a plurality of CPU threads, detecting whether an idle thread exists in the preset CPU thread pool may be detecting whether a thread which does not perform feature processing exists in the preset CPU thread pool, and if so, determining that an idle thread exists in the preset CPU thread pool; if the idle thread does not exist, the idle thread does not exist in the preset CPU thread pool.

Step S102: if the idle thread exists, extracting a CPU thread from the preset CPU thread pool, and taking the CPU thread as a CPU processing thread corresponding to the data reasoning request.

It can be understood that if the preset CPU thread pool has an idle thread, it indicates that there is an unoccupied CPU thread, so that at this time, one CPU thread can be directly extracted from the preset CPU thread pool as a CPU processing thread corresponding to the data reasoning request. If a plurality of idle threads exist in the preset CPU thread pool, at the moment, one CPU processing thread corresponding to the data reasoning request can be randomly selected from the idle CPU threads.

Further, since the data reasoning device may interface with multiple devices at the same time, when busy, it may cause that all CPU threads in the preset CPU thread pool are occupied at a certain moment, at this moment, the received data reasoning request cannot be processed, and a part of device push data reasoning requests may be initiated only once, at this moment, if the received data reasoning requests are directly discarded, data loss may be caused, and in order to avoid such a situation, after step S101, the method may further include:

It should be noted that, in order to avoid that the received data reasoning request can still be normally stored when no idle thread exists in the preset CPU thread pool, and avoid data loss, a queue may be created in advance for the preset CPU thread pool as a thread buffer queue, if no idle thread exists in the preset CPU thread pool when the data reasoning request is received, the received data reasoning request may be stored in the thread buffer queue, and then, a queue listener of the thread buffer queue may continuously monitor an idle state of threads in the preset CPU thread pool, and when an idle thread exists in the preset CPU thread pool, take out the data reasoning request from the thread buffer queue and allocate a corresponding thread buffer queue for the taken data reasoning request.

In practical use, the queue needs to be allocated with corresponding resources when being created, the data which can be stored in the queue is limited, namely, the queue has a maximum length, if the queue element in the queue reaches the upper limit, the data is stored in the queue continuously, the error reporting is caused, even the data is lost, therefore, before the data reasoning request is stored in the thread buffer queue, whether the queue element in the thread buffer queue corresponding to the preset CPU thread pool reaches the upper limit or not needs to be detected, and the data reasoning request is stored in the thread buffer queue when the upper limit is not reached. Detecting whether the queue elements in the thread buffer queue corresponding to the preset CPU thread pool reach the upper limit or not can be to acquire the number of current queue elements in the preset CPU thread pool, compare the number of current queue elements with the maximum length of the queue, and if the number of queue elements is equal to the maximum length of the queue, judge that the queue elements in the thread buffer queue corresponding to the preset CPU thread pool reach the upper limit; if the number of the queue elements is smaller than the maximum length of the queue, it can be determined that the queue elements in the thread buffer queue corresponding to the preset CPU thread pool do not reach the upper limit.

Further, in order to further avoid data loss, after the step of detecting whether the queue element in the thread buffer queue corresponding to the preset CPU thread pool has reached the upper limit if there is no idle thread in this embodiment, the method may further include:

It should be noted that, if the queue element in the thread buffer queue corresponding to the preset CPU thread pool has reached the upper limit, it indicates that the number of received data reasoning requests is too large, or the data is too complex, and the processing consumes too much time, resulting in a large number of tasks being accumulated, so that the thread buffer queue is also full, and the data cannot be stored in the thread buffer queue continuously at this time, while if the received data reasoning requests are discarded, the data may still be lost, so that the preset CPU thread pool may be expanded by the preset expansion rule, and then the line Cheng Huanchong queue may be synchronously expanded when the expansion is completed, so that the thread buffer queue may accommodate more data, and then the data reasoning requests are added into the thread buffer queue, thereby further avoiding the data loss.

In practical use, the preset capacity expansion rule may be preset by a manager of the data reasoning device according to actual needs, for example: and setting a preset capacity expansion rule to detect the total number of threads in a preset CPU thread pool, and if the total number of threads is smaller than a preset thread total threshold value, synchronizing the total number of threads in the preset CPU thread pool with +1, and synchronizing the queue length of a buffer queue with +1. The preset thread total threshold value can be determined by the CPU performance upper limit in the data reasoning device.

In the embodiment, when a data reasoning request is received, whether idle threads exist in a preset CPU thread pool or not is detected; if the idle thread exists, extracting a CPU thread from the preset CPU thread pool, and taking the CPU thread as a CPU processing thread corresponding to the data reasoning request. Because the CPU threads are managed through the thread pool, the performance waste of frequently creating and destroying the threads is avoided, and the utilization rate of equipment resources is improved, so that the overall performance of the service is further improved.

Referring to fig. 4, fig. 4 is a flowchart of a third embodiment of a data reasoning method according to the present invention.

Based on the above-mentioned first embodiment, after the step S30 of the data reasoning method of the present embodiment, the method may further include:

Step S40': if the idle GPU thread does not exist, detecting whether a queue element in a preset GPU queue reaches an upper limit.

It should be noted that the preset GPU queue may be a pre-created queue for storing data reasoning tasks.

It can be understood that if there is no idle GPU thread, it means that the GPU does not have idle resources to execute the generated data inference task at this time, then the generated data inference task may be temporarily stored at this time and then executed when the GPU has idle resources, and in order to avoid data loss caused by phenomena such as error reporting when storing the data inference task, it may be detected whether the queue element in the preset GPU queue has reached the upper limit.

Detecting whether the number of the queue elements in the preset GPU queue reaches the upper limit can be to acquire the number of the queue elements in the preset GPU queue, compare the number of the queue elements with the maximum length of the preset GPU queue, and judge that the number of the queue elements in the preset GPU queue reaches the upper limit if the number of the queue elements is equal to the maximum length of the preset GPU queue; if the number of queue elements is less than the maximum length of the preset GPU queue, it can be determined that the queue elements in the preset GPU queue do not reach the upper limit.

Further, in order to avoid data loss, after step S40', the method of the present embodiment may further include:

It should be noted that, if the queue element in the preset GPU queue has reached the upper limit, it indicates that the data cannot be added to the preset GPU queue continuously, and in order to avoid data loss, a buffer sub-queue may be created for the preset GPU queue at this time, then the CPU thread is set to a waiting state, and then the data reasoning task is added to the buffer sub-queue, so that the queue listener of the buffer sub-queue monitors the number of queue elements in the preset GPU queue in real time, and when the queue element in the preset GPU queue does not reach the upper limit, the data reasoning task is taken out from the buffer sub-queue and added to the preset GPU queue.

Step S50': if the queue element in the preset GPU queue does not reach the upper limit, setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the preset GPU queue.

It can be understood that if the queue element in the preset GPU queue does not reach the upper limit, the data may be stored in the preset GPU queue continuously, so that the CPU processing thread may be set to a waiting state, and the data reasoning task is added to the preset GPU queue, then the queue listener in the preset GPU queue may monitor continuously whether there is an idle GPU thread, and when there is an idle GPU thread, the data reasoning task is taken out from the preset GPU queue, and executed by the idle GPU thread, and then the CPU processing thread previously set to the waiting state is awakened, and the execution result of the data reasoning task is returned to the CPU processing thread.

Further, since in an actual scenario, a part of tasks are urgent and need to be processed preferentially, when a data reasoning task is added to a preset GPU queue, it needs to be determined whether the data reasoning task is urgent and then how to add the task, and at this time, the step of adding the data reasoning task to the preset GPU queue in this embodiment may include:

detecting whether the data reasoning task has a priority identification or not;

It should be noted that, whether the priority identifier exists in the data reasoning task may be detected, if the priority field exists in the data reasoning task, the attribute value corresponding to the priority field is the priority identifier, and at this time, it may be determined that the priority identifier exists in the data reasoning task; if the priority field does not exist, the data reasoning task can be judged to have no priority identification.

In practical use, the data of the queue is generally added from the head of the queue, and taken out from the tail of the queue, and if the data reasoning task does not have a priority identifier, the data reasoning task is not a task requiring priority processing, so that the data reasoning task can be directly added to the head of the queue of the preset GPU queue.

Further, in order to ensure that the task that can be processed preferentially first, after the step of detecting whether the data inference task has the priority identifier, the method further includes:

It should be noted that, the searching for the priority level corresponding to the priority identifier may be searching for the priority level corresponding to the priority identifier according to a preset level identifier mapping table, where the preset level identifier mapping table may include a mapping relationship between the priority level and the priority identifier, where the mapping relationship may be preset by a manager of the data reasoning device. Adding the data inference task to the preset GPU queue according to the queue addition location may be inserting the data inference task to the preset GPU queue at the queue addition location.

In practical use, determining the queue adding position according to the priority level and the priority level corresponding to each task in the preset GPU queue may be comparing the priority level with the task priority level corresponding to the task at the tail of the queue of the preset GPU queue, if the priority level is greater than the task priority level corresponding to the task at the tail of the queue, the tail of the queue may be directly used as the queue adding position of the data reasoning task, if the priority level is less than or equal to the task priority level corresponding to the task at the tail of the queue, the task priority level corresponding to the task at the front of the task at the tail of the queue may be obtained, if the priority level is greater than the task priority level corresponding to the task at the front of the task at the tail of the queue, the queue adding position may be set as the queue adding position of the data reasoning task at the front of the task at the tail of the queue, and so on, and the priority level is compared with the task priority levels corresponding to each task in the preset GPU queue, thereby determining the queue adding position.

For ease of understanding, reference is made to fig. 5, which is not intended to limit the present invention, and fig. 5 is a schematic diagram of the data flow of the data reasoning method of the present invention. The data reasoning equipment stores a graph structure and a reasoning engine (Trt engine), the reasoning engine is constructed based on an onnx model, and the onnx model is a general model obtained by converting a trained depth graph model (Tensorflow model). When the data reasoning equipment receives a data reasoning request, the data reasoning request is stored in a first buffer queue, then a corresponding CPU processing Thread (thread_1-thread_n) is allocated to the data reasoning request, the CPU processing Thread can call a graph structure stored in the data reasoning equipment to perform feature processing work, after the processing is finished, a corresponding data reasoning task is generated, then the data reasoning task is stored in a second buffer queue, and when idle GPU threads exist in a plurality of GPUs (GPU_0-GPU_n) with the number of 0-n, the idle GPU threads call a reasoning engine to execute the data reasoning task.

In the embodiment, if no idle GPU thread exists, whether a queue element in a preset GPU queue reaches an upper limit is detected; if the queue element in the preset GPU queue does not reach the upper limit, setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the preset GPU queue. Because the data reasoning task is added to the preset GPU queue when the idle GPU thread does not exist, data loss is avoided, whether the queue element reaches the upper limit is detected before the data reasoning task is added to the preset GPU queue, the data reasoning task can be normally added to the preset GPU queue, and the phenomenon of data loss is further avoided.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a data reasoning program, and the data reasoning program realizes the steps of the data reasoning method when being executed by a processor.

Referring to fig. 6, fig. 6 is a block diagram showing the structure of a first embodiment of the data inference apparatus of the present invention.

As shown in fig. 6, the data reasoning device provided in the embodiment of the present invention includes:

the request processing module 10 is configured to allocate a corresponding CPU processing thread to a data reasoning request when the data reasoning request is received;

the feature processing module 20 is configured to perform feature processing on the data reasoning request through the CPU processing thread, and construct a corresponding data reasoning task according to a feature processing result when the processing is completed;

a thread detection module 30, configured to detect whether an idle GPU thread exists;

and the thread allocation module 40 is configured to allocate a corresponding GPU thread to the data inference task if there is an idle GPU thread, and execute the data inference task through the GPU thread.

Further, the request processing module 10 is further configured to detect whether an idle thread exists in a preset CPU thread pool when a data reasoning request is received; if the idle thread exists, extracting a CPU thread from the preset CPU thread pool, and taking the CPU thread as a CPU processing thread corresponding to the data reasoning request.

Further, the request processing module 10 is further configured to detect whether a queue element in a thread buffer queue corresponding to the preset CPU thread pool has reached an upper limit if no idle thread exists; if the upper limit is not reached, the data reasoning request is added to the thread buffer queue.

Further, the request processing module 10 is further configured to expand the preset CPU thread pool according to a preset expansion rule if the upper limit has been reached, and perform synchronous expansion on the thread buffer queue when the expansion is completed; and when the synchronous capacity expansion of the thread buffer queue is completed, adding the data reasoning request into the thread buffer queue.

Further, the thread allocation module 40 is further configured to detect whether a queue element in the preset GPU queue has reached an upper limit if there is no idle GPU thread; if the queue element in the preset GPU queue does not reach the upper limit, setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the preset GPU queue.

Further, the thread allocation module 40 is further configured to detect whether the priority identifier exists in the data reasoning task; and if the priority identification does not exist, adding the data reasoning task to the head of the queue in the preset GPU queue.

Further, the thread allocation module 40 is further configured to, if a priority identifier exists, search a priority level corresponding to the priority identifier; determining a queue adding position according to the priority level and the task priority corresponding to each task in the preset GPU queue; and adding the data reasoning task into the preset GPU queue according to the queue adding position.

Further, the thread allocation module 40 is further configured to create a buffer sub-queue for the preset GPU queue if the queue element in the preset GPU queue has reached the upper limit; and setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the buffer sub-queue.

Further, the thread allocation module 40 is further configured to read a task inference result from the GPU thread when the data inference task is executed; and transmitting the task reasoning result into the CPU processing thread so that the CPU processing thread encapsulates the task reasoning result and feeds back the processing result to the request end of the data reasoning request.

Further, the thread allocation module 40 is further configured to clear thread data in the CPU processing thread when the request end of the data inference request successfully receives the processing result; and when the thread data is cleared, adding the CPU processing thread into a preset CPU thread pool.

Further, the thread allocation module 40 is further configured to obtain all idle GPU threads, and determine a previous execution time corresponding to each idle GPU thread according to the history log; and sequencing the idle GPU threads from small to large according to the previous execution time, and taking the GPU thread sequenced first in the sequencing result as the GPU thread corresponding to the data reasoning task.

It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

In addition, technical details that are not described in detail in this embodiment may refer to the data reasoning method provided in any embodiment of the present invention, which is not described herein again.

Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

The invention discloses A1, a data reasoning method, which comprises the following steps:

detecting whether an idle GPU thread exists;

A2, the data reasoning method as set forth in A1, wherein the step of allocating a corresponding CPU processing thread to the data reasoning request when the data reasoning request is received comprises:

A3, the data reasoning method as set forth in A2, wherein after the step of detecting whether the idle thread exists in the preset CPU thread pool when the data reasoning request is received, the method further comprises:

A4, after the step of detecting whether the column-alignment element in the thread buffer queue corresponding to the preset CPU thread pool has reached the upper limit if no idle thread exists, the data reasoning method described in A3 further includes:

A5, the data reasoning method as set forth in A1, after the step of detecting whether there is an idle GPU thread, further includes:

A6, the data reasoning method as set forth in A5, wherein the step of adding the data reasoning task to the preset GPU queue includes:

detecting whether the data reasoning task has a priority identification or not;

A7, the data reasoning method as set forth in A6, wherein the step of detecting whether the data reasoning task has a priority identifier includes:

A8, the data reasoning method as set forth in A5, wherein after the step of detecting whether the queue element in the preset GPU queue has reached the upper limit if there is no idle GPU thread, further includes:

A9, the data reasoning method as set forth in A1, wherein if there is an idle GPU thread, the step of allocating a corresponding GPU thread to the data reasoning task and executing the data reasoning task by the GPU thread further includes:

A10, the data reasoning method as set forth in A9, wherein the step of transmitting the task reasoning result into the CPU processing thread to enable the CPU processing thread to encapsulate the task reasoning result and feed back the processing result to the request end of the data reasoning request further includes:

A11, the data reasoning method as set forth in any of A1-A10, wherein the step of allocating a corresponding GPU thread to the data reasoning task includes:

The invention also discloses a B12 and a data reasoning device, wherein the data reasoning device comprises the following modules:

B13, the data reasoning device as described in B12, wherein the request processing module is further configured to detect whether an idle thread exists in a preset CPU thread pool when a data reasoning request is received; if the idle thread exists, extracting a CPU thread from the preset CPU thread pool, and taking the CPU thread as a CPU processing thread corresponding to the data reasoning request.

B14, the data reasoning device as described in B13, wherein the request processing module is further configured to detect whether a queue element in a thread buffer queue corresponding to the preset CPU thread pool has reached an upper limit if no idle thread exists; if the upper limit is not reached, the data reasoning request is added to the thread buffer queue.

B15, the data reasoning device of B14, the said request processing module, is used for also if reaching the upper limit, expand the said default CPU thread pool according to the rule of default expansion, and expand the said thread buffer queue synchronously when expanding and finishing; and when the synchronous capacity expansion of the thread buffer queue is completed, adding the data reasoning request into the thread buffer queue.

B16, the data reasoning device as described in B12, wherein the thread allocation module is further configured to detect whether a queue element in a preset GPU queue has reached an upper limit if there is no idle GPU thread; if the queue element in the preset GPU queue does not reach the upper limit, setting the CPU processing thread to be in a waiting state, and adding the data reasoning task into the preset GPU queue.

B17, the data reasoning device as set forth in B16, wherein the thread allocation module is further configured to detect whether a priority identifier exists in the data reasoning task; and if the priority identification does not exist, adding the data reasoning task to the head of the queue in the preset GPU queue.

B18, the data reasoning device of B17, the said thread distribution module, if there is priority label, find out the priority grade that the said priority label corresponds to; determining a queue adding position according to the priority level and the task priority corresponding to each task in the preset GPU queue; and adding the data reasoning task into the preset GPU queue according to the queue adding position.

The invention also discloses C19, a data reasoning device, the data reasoning device includes: a processor, a memory and a data reasoning program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data reasoning method as described above.

The invention also discloses D20 and a computer readable storage medium, which is characterized in that the computer readable storage medium is stored with a data reasoning program, and the data reasoning program realizes the steps of the data reasoning method when being executed by a processor.

Claims

1. A data reasoning method, characterized in that the data reasoning method comprises the steps of:

detecting whether an idle GPU thread exists;

2. The data reasoning method of claim 1, wherein the step of allocating a corresponding CPU processing thread to a data reasoning request upon receipt of the data reasoning request comprises:

3. The data reasoning method of claim 2, wherein after the step of detecting whether there is a free thread in the preset CPU thread pool when the data reasoning request is received, the method further comprises:

4. The data reasoning method of claim 3 wherein after the step of detecting whether the column-aligned element in the thread buffer queue corresponding to the preset CPU thread pool has reached the upper limit if there is no idle thread, the method further comprises:

5. The data reasoning method of claim 1, wherein after the step of detecting whether there is an idle GPU thread, further comprising:

6. The data reasoning method of claim 5, wherein the step of adding the data reasoning task to the preset GPU queue comprises:

detecting whether the data reasoning task has a priority identification or not;

7. The data reasoning method of claim 6, wherein the step of detecting whether the data reasoning task has a priority identification comprises:

8. A data reasoning apparatus, characterized in that the data reasoning apparatus comprises the following modules:

9. A data reasoning apparatus, characterized in that the data reasoning apparatus comprises: a processor, a memory and a data reasoning program stored on the memory and executable on the processor, which data reasoning program, when executed by the processor, implements the steps of the data reasoning method as claimed in any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a data reasoning program, which when executed by a processor implements the steps of the data reasoning method as claimed in any of the claims 1-7.