[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111290917A - YARN-based resource monitoring method and device and terminal equipment - Google Patents

YARN-based resource monitoring method and device and terminal equipment Download PDF

Info

Publication number
CN111290917A
CN111290917A CN202010120079.7A CN202010120079A CN111290917A CN 111290917 A CN111290917 A CN 111290917A CN 202010120079 A CN202010120079 A CN 202010120079A CN 111290917 A CN111290917 A CN 111290917A
Authority
CN
China
Prior art keywords
task
running
yarn
target
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010120079.7A
Other languages
Chinese (zh)
Inventor
程飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yunzhirong Technology Co ltd
Original Assignee
Shenzhen Yunzhirong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yunzhirong Technology Co ltd filed Critical Shenzhen Yunzhirong Technology Co ltd
Priority to CN202010120079.7A priority Critical patent/CN111290917A/en
Publication of CN111290917A publication Critical patent/CN111290917A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/481Exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application is suitable for the technical field of big data, and provides a resource monitoring method, a resource monitoring device and terminal equipment based on YARN, wherein the resource monitoring method, the resource monitoring device and the terminal equipment comprise the following steps: in a target time period, acquiring running state information of each executed target task through a YARN interface every other first preset time, wherein the running state information comprises running time and the size of resources occupied by running; and if the target task is detected to have an abnormal task, terminating the abnormal task, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resource occupied by the running task exceeding a preset resource threshold value. The embodiment of the application can improve the resource utilization rate and the task operation efficiency of the YARN-based distributed system.

Description

YARN-based resource monitoring method and device and terminal equipment
Technical Field
The application belongs to the technical field of big data, and particularly relates to a resource monitoring method and device based on YARN and terminal equipment.
Background
Another Resource coordinator (YARN) is a new Hadoop Resource manager, which is a universal Resource management system, and can provide uniform Resource management and scheduling for upper-layer applications, and its introduction brings great benefits for the cluster in the aspects of utilization rate, uniform Resource management, data sharing, and the like.
Although YARNs can optimize resource allocation in Hadoop distributed systems, problems of tasks not being able to execute or being too slow to execute may still result during periods of system busy.
Disclosure of Invention
In view of this, embodiments of the present application provide a method and an apparatus for resource monitoring based on YARN, and a terminal device, so as to solve the problem in the prior art that a task cannot be executed or is executed too slowly due to busy system and insufficient operating resources in a distributed system based on YARN.
A first aspect of the embodiments of the present application provides a method for monitoring resources based on YARN, which is characterized by including:
in a target time period, acquiring running state information of each executed target task through a YARN interface every other first preset time, wherein the running state information comprises running time and the size of resources occupied by running;
and if the target task is detected to have an abnormal task, terminating the abnormal task, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resource occupied by the running task exceeding a preset resource threshold value.
Further, before the target task is specifically a task other than the task white list and the running state information of each executed target task is acquired through the YARN interface every preset time interval in the target time period, the method further includes:
and setting a task white list.
Further, before acquiring the running state information of each executing target task through the YARN interface every first preset time within the target time period, the method further includes:
receiving a setting instruction, setting a target time period and a first preset time according to the setting instruction, and setting an operation time threshold and/or a preset resource threshold according to the setting instruction.
Further, after the step of terminating the abnormal task if it is detected that the abnormal task exists in the target task, the method further includes:
storing the information of the abnormal task into a list to be processed;
and executing the tasks in the list to be processed in the non-target time period.
Further, the YARN-based resource monitoring method further includes:
if the executed task is detected, storing running log information of the executed task to a task running log database, wherein the running log information at least comprises running starting time information, running total duration and resource use information of the executed task, and the executed task comprises a completely-executed task and a terminated abnormal task.
Further, the YARN-based resource monitoring method further includes:
and adjusting the threshold value of the running time and/or the threshold value of the preset resource every second preset time according to the running log information in the task running log database.
Further, the YARN-based resource monitoring method further includes:
and counting the running starting time information of the executed task in the task running log database every other third preset time length, and adjusting the target time period.
A second aspect of the embodiments of the present application provides a resource monitoring apparatus based on YARN, which is characterized by including:
the running state information acquiring unit is used for acquiring running state information of each executed target task through a YARN interface every other first preset time within a target time period, wherein the running state information comprises running time and the size of resources occupied by running;
and the abnormal task termination unit is used for terminating the abnormal task if the abnormal task is detected to exist in the target task, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resource occupied by running exceeding a preset resource threshold value.
A third aspect of the embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the computer program is executed by the processor, the terminal device is enabled to implement the steps of the YARN-based resource monitoring method.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, causes a terminal device to implement the steps of the YARN-based resource monitoring method as described.
A fifth aspect of the embodiments of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the steps of the YARN-based resource monitoring method according to any one of the first aspect.
Compared with the prior art, the embodiment of the application has the advantages that: in the embodiment of the application, in the target time period, the YARN interface can be used for automatically acquiring the running state information of each executed target task at intervals, and stopping the abnormal task with too long running time or too large resource occupied by running, so that the long-time occupation of the running resource in the distributed system by the abnormal task is avoided, sufficient running resource in the distributed system is ensured to be normally used by other tasks in the target time period, and the resource utilization rate and the task running efficiency are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flow chart illustrating an implementation of a first YARN-based resource monitoring method according to an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating an implementation of a second YARN-based resource monitoring method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a YARN-based resource monitoring apparatus according to an embodiment of the present application;
fig. 4 is a schematic diagram of a terminal device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In addition, in the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
The first embodiment is as follows:
fig. 1 shows a schematic flowchart of a first resource monitoring method based on YARN according to an embodiment of the present application, where an execution subject of the embodiment of the present application is a terminal device, and the terminal device can access a YARN interface of a distributed system based on YARN, which is detailed as follows:
in S101, in a target time period, the running state information of each executed target task is acquired through the YARN interface every first preset time, where the running state information includes the running time and the size of the resource occupied by the running task.
The YARN interface in the embodiment of the present application specifically refers to a Representational State Transfer application interface (REST API) provided by a Hadoop YARN Web service, and is used to acquire information of clusters, nodes, and applications of a distributed system based on the YARN. The YARN-based distributed system comprises a Hadhoop cluster and a YARN for managing cluster resources, and all tasks (jobs) submitted to the YARN-based distributed system are finally operated in the cluster in the form of Application programs (Application), so that information of each Application program is acquired through a YARN interface, and information of each task in execution can be acquired.
In the embodiment of the Application, in a target time period, a YARN interface is called every other first preset time by a batch workflow task scheduler AZKABAN to obtain running state information of each executing target task, where the running state information includes running time and a size of a resource occupied by running, and an Application Identification (Application Identification, Application id) of each executing target task and a corresponding running time and a size of the resource occupied by running are returned by calling the YARN interface. The target time period in the embodiment of the application is specifically a time period set according to a busy time of task processing (for example, a busy time period of business from nine am to two pm every day), and the first preset time period may be set according to an actually required monitoring frequency, for example, may be set to 5 minutes.
In S102, if it is detected that an abnormal task exists in the target tasks, the abnormal task is terminated, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resources occupied by the running task exceeding a preset resource threshold value.
After the running state information of each executed target task is acquired, comparing the running time of each target task with a pre-stored running time threshold value, and/or comparing the size of resources occupied by running of each target task with a preset resource threshold value, if the running time of the target task exceeds the running time threshold value and/or the size of the resources occupied by running exceeds the preset resource threshold value, judging the target task as an abnormal task, and terminating the abnormal task. Specifically, the AppID of the abnormal task is determined according to the AppID information returned when the running state information of the target task is acquired, a kill instruction of the YARN is called according to the AppID, the progress of the abnormal task is cleaned, and therefore the abnormal task is terminated. The operation duration threshold and the preset resource threshold in the embodiment of the present application may be set in advance according to actual needs, for example, the operation duration threshold may be 1 hour, 2 hours, and the like, and the preset resource threshold may be 500M, 1G, and the like.
Optionally, the target task is specifically a task outside of a task white list, and correspondingly before step S101, the method further includes:
and setting a task white list.
In the embodiment of the application, the task white list comprises a plurality of designated tasks, and the designated tasks are important tasks which need to be guaranteed to run stably and are not cleaned by a system, namely whether the tasks run for a long time or occupy a large memory or not, the tasks are not regarded as abnormal tasks, so that the tasks in the task white list are not monitored and cleaned even in a target time period, namely the target tasks monitored in the target time period are tasks except the task white list.
Before step S101, a white list setting instruction may be received to set a task white list. Specifically, task identifiers carried in a white list setting instruction are obtained and stored in a task white list, and one task identifier is uniquely corresponding to one designated task.
Specifically, after the task white list is set, in the target time period in S101, only the target tasks other than the task white list are scanned, and the running state information of each executed target task is acquired through the YARN interface every first preset time. Or, in S101, all executing tasks are scanned in a target time period, and a task whose task identifier meets a task white list is removed from all executing tasks, so as to obtain a target task.
In the embodiment of the application, the task white list is set in advance and excluded from the monitored target task, so that the specified important task can be ensured to stably run and is not cleared as an abnormal task.
Optionally, before the step S101, the method further includes:
receiving a setting instruction, setting a target time period and a first preset time according to the setting instruction, and setting an operation time threshold and/or a preset resource threshold according to the setting instruction.
Receiving a setting instruction, setting a target time period and a first time period according to setting information carried in the setting instruction, and setting an operation time period threshold value and/or a preset resource threshold value according to the setting instruction. The target time period setting information in the setting instruction may be determined according to a busy time period in which the number of the running tasks counted by the historical running information exceeds a preset number (the target time period may be set to the busy time period), the running duration threshold setting information may be determined according to an average duration required for complete running of the tasks counted by the historical running information (the set running duration threshold may be slightly greater than the average duration required for complete running of the general tasks), and the setting information of the preset resource threshold may be set according to an average resource size occupied by the running of the tasks counted by the historical running information (the set preset resource threshold may be slightly greater than the average resource size occupied by the running of the general tasks).
Optionally, after the step S102, the method further includes:
storing the information of the abnormal task into a list to be processed;
and executing the tasks in the list to be processed in the non-target time period.
After the abnormal task is terminated, storing the information of the abnormal task into a to-be-processed list, for example, storing the task identifier of the abnormal task and the original task request information of the abnormal task into the to-be-processed list.
And then, in a non-target time period, namely a non-busy time period, acquiring the information of the abnormal tasks in the list to be processed, and executing the tasks in the list to be processed according to the information of the abnormal tasks.
In the embodiment of the application, because the information of the abnormal tasks which stop running in the target time period is stored in the to-be-processed list, and the tasks which are stored in the to-be-processed list and have too long running time and/or too large occupied resources are executed in the non-target time period when the system is not busy, the tasks which need to consume a large amount of running resources can be guaranteed to run off peak by peak, so that other tasks in the busy target time period can be guaranteed to run normally and efficiently, the abnormal tasks can be guaranteed to be effectively executed in the idle non-target time period, and the resource utilization rate and the task running efficiency of the YARN-based distributed system are greatly improved.
Compared with the prior art, the embodiment of the application has the advantages that: in the embodiment of the application, in the target time period, the YARN interface can be used for automatically acquiring the running state information of each executed target task at intervals, and stopping the abnormal task with too long running time or too large resource occupied by running, so that the long-time occupation of the running resource in the distributed system by the abnormal task is avoided, the overload phenomenon of the system is reduced, sufficient running resource in the distributed system in the target time period is ensured to be normally used by other tasks, and the resource utilization rate and the task running efficiency are improved.
Example two:
fig. 2 shows a flowchart of a second method for monitoring resources based on YARN according to an embodiment of the present application, where an execution subject of the embodiment of the present application is a terminal device, and the terminal device can access a YARN interface of a distributed system based on YARN, which is detailed as follows:
the YARN-based resource monitoring method in the embodiment of the present application includes steps S101 to S102 as described in the first embodiment, and also includes a step of storing log information. The detailed contents of steps S101 to S102 in the embodiment of the present application are completely the same as those in the first embodiment, and are not described again here. The operation log information storage steps of the embodiment of the application are detailed as follows:
s201: if the executed task is detected, storing running log information of the executed task to a task running log database, wherein the running log information at least comprises running starting time information, running total duration and resource use information of the executed task, and the executed task comprises a completely-executed task and a terminated abnormal task.
Step S201 in the embodiment of the present application is continuously executed in both the target time period and the non-target time period. When detecting that the executed task exists in the system, acquiring running log information of the executed task and storing the running log information into a task running log database. The running log information at least includes running start time information, running total duration information and resource use information of the executed task, and the resource use information may include the size of the resource occupied by the running, the identification number of the specific resource used, the size of the remaining resource of the whole system at the execution end time, and the like. The executed tasks comprise a completely operated normal task and an stopped abnormal task, and different marking information can be respectively added to the operated task and the stopped abnormal task during storage, so that the operated task and the stopped abnormal task can be distinguished.
In the embodiment of the application, the running log information of the executed task is stored in the task running log database, so that all historical task execution information in the system can be stored, and later query and statistics are facilitated.
Optionally, the YARN-based resource monitoring method according to the embodiment of the present application further includes:
s2021: and adjusting the threshold value of the running time and/or the threshold value of the preset resource every second preset time according to the running log information in the task running log database.
In this embodiment of the application, running log information of a period of time (the length of the period of time may be equal to a second preset time) in the task running log database may be obtained every second preset time, and the size of the running time threshold and/or the preset resource threshold used in step S102 is adjusted, where the second preset time may be one day, one week, two weeks, or one month, and is not limited herein. For example, every two weeks, obtaining running log information in the last two weeks in the task running log database, counting the average duration of all the normal tasks which are finished running in the last two weeks, and adjusting the running duration threshold to a value slightly larger than the average duration; or, counting the average occupied resource size of all the normal tasks which are finished running in the past two weeks, and adjusting the preset resource threshold value to be slightly larger than the average occupied resource size. Or, counting the number of abnormal tasks and the size of the remaining resources of the whole system at the end time of each abnormal task in the past two weeks in the task running log database every two weeks, and if the number of the abnormal tasks is larger than a first preset threshold and the average value of the sizes of the remaining resources of the whole system at the end time of the abnormal tasks is smaller than a second preset threshold, which indicates that the current running time threshold is set too low or the size of the preset resource threshold is set too low, so that a large number of tasks are judged to be abnormal tasks when the system has enough remaining resources, then appropriately (according to a preset step value) increasing the running time threshold and/or the preset resource threshold.
In the embodiment of the application, the operation duration threshold value and/or the preset resource threshold value can be adjusted according to the operation log information stored in the operation log database, so that the subsequent abnormal task judgment can better meet the actual situation, and the resource utilization rate and the task operation efficiency of the YARN-based distributed system can be more accurately and effectively improved.
Optionally, the YARN-based resource monitoring method according to the embodiment of the present application further includes:
s2022: and counting the running starting time information of the executed task in the task running log database every other third preset time length, and adjusting the target time period.
And counting the running starting time information of all executed tasks in the task running log database every other third preset time length, determining the time period in which the quantity of the running tasks is greater than a third preset threshold value as a busy time period, and adjusting the target time period to the busy time period. The third preset time period may be a day, a week, two weeks, or a month, and may be the same as or different from the second preset time period, which is not limited herein.
In the embodiment of the application, the target time period can be accurately determined as the busy time period according to the running starting time information of the tasks which are finished by historical execution, so that the resource use monitoring in the busy time period can be accurately realized, the long-time occupation of the running resources in the distributed system by the abnormal tasks in the busy time period is avoided, and the resource utilization rate and the task running efficiency are improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Example three:
fig. 3 shows a schematic structural diagram of a resource monitoring apparatus based on YARN according to an embodiment of the present application, and for convenience of description, only the parts related to the embodiment of the present application are shown:
the resource monitoring device based on the YARN comprises: an operation state information acquisition unit 31 and an abnormal task termination unit 32. Wherein:
an operation state information obtaining unit 31, configured to obtain, in a target time period, operation state information of each executed target task through a YARN interface every first preset time, where the operation state information includes an operation time and a size of a resource occupied by operation;
an abnormal task termination unit 32, configured to terminate the abnormal task if it is detected that an abnormal task exists in the target task, where the abnormal task is a task whose operation duration exceeds an operation duration threshold and/or whose size of resources occupied by operation exceeds a preset resource threshold.
Optionally, the target task is specifically a task outside a task white list, and the YARN-based resource monitoring apparatus further includes:
and the first setting unit is used for setting the task white list.
Optionally, the YARN-based resource monitoring apparatus further includes:
and the second setting unit is used for receiving a setting instruction, setting a target time period and a first preset time length according to the setting instruction, and setting an operation time length threshold value and/or a preset resource threshold value according to the setting instruction.
Optionally, the YARN-based resource monitoring apparatus further includes:
the abnormal task processing unit is used for storing the information of the abnormal task into a list to be processed; and executing the tasks in the list to be processed in the non-target time period.
Optionally, the YARN-based resource monitoring apparatus further includes:
the running log information storage unit is used for storing the running log information of the executed task to a task running log database if the executed task is detected, wherein the running log information at least comprises running starting time information, running total duration and resource use information of the executed task, and the executed task comprises the executed task and a terminated abnormal task.
Optionally, the YARN-based resource monitoring apparatus further includes:
and the first adjusting unit is used for adjusting the threshold value of the running time and/or the size of the preset resource threshold value every second preset time according to the running log information in the task running log database.
Optionally, the YARN-based resource monitoring apparatus further includes:
and the second adjusting unit is used for counting the running starting time information of the executed task in the task running log database every third preset time length and adjusting the target time period.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Example four:
fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present application. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a YARN-based resource monitoring program, stored in said memory 41 and operable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the various YARN-based resource monitoring method embodiments described above, such as the steps S101 to S102 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the units 31 to 32 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into an operation state information obtaining unit and an exception task terminating unit, and the specific functions of each unit are as follows:
and the running state information acquisition unit is used for acquiring running state information of each executed target task through the YARN interface every other first preset time within a target time period, wherein the running state information comprises running time and the size of resources occupied by running.
And the abnormal task termination unit is used for terminating the abnormal task if the abnormal task is detected to exist in the target task, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resource occupied by running exceeding a preset resource threshold value.
The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A resource monitoring method based on YARN is characterized by comprising the following steps:
in a target time period, acquiring running state information of each executed target task through a YARN interface every other first preset time, wherein the running state information comprises running time and the size of resources occupied by running;
and if the target task is detected to have an abnormal task, terminating the abnormal task, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resource occupied by the running task exceeding a preset resource threshold value.
2. The YARN-based resource monitoring method of claim 1, wherein the target task is specifically a task outside of a task white list, and before acquiring the running status information of each executing target task through the YARN interface at preset intervals in the target time period, the method further comprises:
and setting a task white list.
3. The YARN-based resource monitoring method of claim 1, wherein prior to obtaining the running status information for each executing target task through the YARN interface every first preset duration during the target time period, further comprising:
receiving a setting instruction, setting a target time period and a first preset time according to the setting instruction, and setting an operation time threshold and/or a preset resource threshold according to the setting instruction.
4. The YARN-based resource monitoring method of claim 1 wherein after said terminating an exception task if an exception task is detected among said target tasks, further comprising:
storing the information of the abnormal task into a list to be processed;
and executing the tasks in the list to be processed in the non-target time period.
5. The YARN-based resource monitoring method of claim 1 further comprising:
if the executed task is detected, storing running log information of the executed task to a task running log database, wherein the running log information at least comprises running starting time information, running total duration and resource use information of the executed task, and the executed task comprises a completely-executed task and a terminated abnormal task.
6. The YARN-based resource monitoring method of claim 5 wherein the method further comprises:
and adjusting the threshold value of the running time and/or the threshold value of the preset resource every second preset time according to the running log information in the task running log database.
7. The YARN-based resource monitoring method of claim 5 wherein the method further comprises:
and counting the running starting time information of the executed task in the task running log database every other third preset time length, and adjusting the target time period.
8. A resource monitoring device based on YARN is characterized by comprising:
the running state information acquiring unit is used for acquiring running state information of each executed target task through a YARN interface every other first preset time within a target time period, wherein the running state information comprises running time and the size of resources occupied by running;
and the abnormal task termination unit is used for terminating the abnormal task if the abnormal task is detected to exist in the target task, wherein the abnormal task is a task with the running time length exceeding a running time length threshold value and/or the size of the resource occupied by running exceeding a preset resource threshold value.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the computer program, when executed by the processor, causes the terminal device to carry out the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes a terminal device to carry out the steps of the method according to any one of claims 1 to 7.
CN202010120079.7A 2020-02-26 2020-02-26 YARN-based resource monitoring method and device and terminal equipment Pending CN111290917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010120079.7A CN111290917A (en) 2020-02-26 2020-02-26 YARN-based resource monitoring method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010120079.7A CN111290917A (en) 2020-02-26 2020-02-26 YARN-based resource monitoring method and device and terminal equipment

Publications (1)

Publication Number Publication Date
CN111290917A true CN111290917A (en) 2020-06-16

Family

ID=71023219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010120079.7A Pending CN111290917A (en) 2020-02-26 2020-02-26 YARN-based resource monitoring method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN111290917A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269643A (en) * 2020-10-10 2021-01-26 北京浪潮数据技术有限公司 Resource monitoring method and device, electronic equipment and storage medium
CN112488412A (en) * 2020-12-11 2021-03-12 北京字跳网络技术有限公司 Duration information determination method and device, electronic equipment and computer storage medium
CN112508449A (en) * 2020-12-21 2021-03-16 北京元心科技有限公司 Task execution method and device, electronic equipment and computer readable storage medium
CN112559859A (en) * 2020-12-08 2021-03-26 杭州海康威视系统技术有限公司 Resource recommendation method and device, electronic equipment and machine-readable storage medium
CN112559292A (en) * 2020-12-18 2021-03-26 北京北方华创微电子装备有限公司 Equipment application monitoring method and semiconductor process equipment
CN113064723A (en) * 2021-03-23 2021-07-02 瀚云科技有限公司 Storage medium, electronic device, bus resource allocation method and device
CN113094197A (en) * 2021-04-09 2021-07-09 中国工商银行股份有限公司 Method, device, equipment and storage medium for judging instruction submission abnormity
CN113239243A (en) * 2021-07-08 2021-08-10 湖南星汉数智科技有限公司 Graph data analysis method and device based on multiple computing platforms and computer equipment
CN113268389A (en) * 2021-06-09 2021-08-17 无锡炫我科技有限公司 Abnormal node monitoring method and device, electronic equipment and readable storage medium
CN113468036A (en) * 2021-07-15 2021-10-01 上海晶赞融宣科技有限公司 Task execution time consumption analysis method and device, storage medium and terminal
CN113778803A (en) * 2021-09-13 2021-12-10 深圳市酷开网络科技股份有限公司 Task resource monitoring system, method and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170084445A (en) * 2016-01-12 2017-07-20 삼성에스디에스 주식회사 Method and apparatus for detecting abnormality using time-series data
CN108021450A (en) * 2017-12-04 2018-05-11 北京小度信息科技有限公司 Job analysis method and apparatus based on YARN
CN108874535A (en) * 2018-05-14 2018-11-23 中国平安人寿保险股份有限公司 A kind of task adjusting method, computer readable storage medium and terminal device
CN110297746A (en) * 2019-07-05 2019-10-01 北京慧眼智行科技有限公司 A kind of data processing method and system
CN110489301A (en) * 2019-08-22 2019-11-22 上海中通吉网络技术有限公司 Analysis method, device and the equipment of mapreduce mission performance
CN110597621A (en) * 2019-08-09 2019-12-20 苏宁金融科技(南京)有限公司 Method and system for scheduling cluster resources

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170084445A (en) * 2016-01-12 2017-07-20 삼성에스디에스 주식회사 Method and apparatus for detecting abnormality using time-series data
CN108021450A (en) * 2017-12-04 2018-05-11 北京小度信息科技有限公司 Job analysis method and apparatus based on YARN
CN108874535A (en) * 2018-05-14 2018-11-23 中国平安人寿保险股份有限公司 A kind of task adjusting method, computer readable storage medium and terminal device
CN110297746A (en) * 2019-07-05 2019-10-01 北京慧眼智行科技有限公司 A kind of data processing method and system
CN110597621A (en) * 2019-08-09 2019-12-20 苏宁金融科技(南京)有限公司 Method and system for scheduling cluster resources
CN110489301A (en) * 2019-08-22 2019-11-22 上海中通吉网络技术有限公司 Analysis method, device and the equipment of mapreduce mission performance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘佳艺;王芳;杨静怡;谭支鹏;: "异构Hadoop集群下的负载自适应反馈调度策略", 计算机工程与科学, no. 03, 15 March 2017 (2017-03-15), pages 12 - 22 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269643A (en) * 2020-10-10 2021-01-26 北京浪潮数据技术有限公司 Resource monitoring method and device, electronic equipment and storage medium
CN112559859A (en) * 2020-12-08 2021-03-26 杭州海康威视系统技术有限公司 Resource recommendation method and device, electronic equipment and machine-readable storage medium
CN112488412A (en) * 2020-12-11 2021-03-12 北京字跳网络技术有限公司 Duration information determination method and device, electronic equipment and computer storage medium
CN112559292A (en) * 2020-12-18 2021-03-26 北京北方华创微电子装备有限公司 Equipment application monitoring method and semiconductor process equipment
CN112508449B (en) * 2020-12-21 2023-06-30 北京元心科技有限公司 Task execution method, device, electronic equipment and computer readable storage medium
CN112508449A (en) * 2020-12-21 2021-03-16 北京元心科技有限公司 Task execution method and device, electronic equipment and computer readable storage medium
CN113064723A (en) * 2021-03-23 2021-07-02 瀚云科技有限公司 Storage medium, electronic device, bus resource allocation method and device
CN113064723B (en) * 2021-03-23 2024-05-24 瀚云科技有限公司 Storage medium, electronic device, bus resource allocation method and device
CN113094197A (en) * 2021-04-09 2021-07-09 中国工商银行股份有限公司 Method, device, equipment and storage medium for judging instruction submission abnormity
CN113268389A (en) * 2021-06-09 2021-08-17 无锡炫我科技有限公司 Abnormal node monitoring method and device, electronic equipment and readable storage medium
CN113239243A (en) * 2021-07-08 2021-08-10 湖南星汉数智科技有限公司 Graph data analysis method and device based on multiple computing platforms and computer equipment
CN113468036A (en) * 2021-07-15 2021-10-01 上海晶赞融宣科技有限公司 Task execution time consumption analysis method and device, storage medium and terminal
CN113468036B (en) * 2021-07-15 2023-11-24 上海晶赞融宣科技有限公司 Time-consuming analysis method and device for task execution, storage medium and terminal
CN113778803A (en) * 2021-09-13 2021-12-10 深圳市酷开网络科技股份有限公司 Task resource monitoring system, method and storage medium

Similar Documents

Publication Publication Date Title
CN111290917A (en) YARN-based resource monitoring method and device and terminal equipment
CN107832126B (en) Thread adjusting method and terminal thereof
CN108848039B (en) Server, message distribution method and storage medium
CN111858055B (en) Task processing method, server and storage medium
CN109766172B (en) Asynchronous task scheduling method and device
CN111506398B (en) Task scheduling method and device, storage medium and electronic device
CN112506808B (en) Test task execution method, computing device, computing system and storage medium
CN109343972B (en) Task processing method and terminal equipment
CN109165135B (en) Data management method, computer readable storage medium and terminal equipment
CN107343023A (en) Resource allocation methods, device and electronic equipment in a kind of Mesos management cluster
CN111061570A (en) Image calculation request processing method and device and terminal equipment
CN111464331B (en) Control method and system for thread creation and terminal equipment
CN110912949A (en) Method and device for submitting sites
CN109889406B (en) Method, apparatus, device and storage medium for managing network connection
CN110716805A (en) Task allocation method and device of graphic processor, electronic equipment and storage medium
CN109086132A (en) A kind of recognition of face task balance call method, device and terminal device
CN109117340A (en) A kind of mobile terminal and its monitoring method, the storage medium of interprocess communication
CN115712572A (en) Task testing method and device, storage medium and electronic device
CN110460663B (en) Data distribution method and device among distributed nodes, server and storage medium
CN114253686A (en) Task scheduling method and device, electronic equipment and storage medium
CN114675973A (en) Resource management method, device, storage medium, and program product
CN102611578A (en) Network equipment data management system in multi-network-equipment environment
CN113760494A (en) Task scheduling method and device
CN110968397B (en) Analysis method and device for virtual machine capacity management
CN115391042B (en) Resource allocation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination