CN112445641B

CN112445641B - Operation maintenance method and system for big data cluster

Info

Publication number: CN112445641B
Application number: CN202011225109.7A
Authority: CN
Inventors: 李燕; 杨雪平; 宋彬彬; 杨雪; 周伟
Original assignee: Dezhou Vocational and Technical College
Current assignee: Dezhou Vocational and Technical College
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2022-08-26
Anticipated expiration: 2040-11-05
Also published as: CN112445641A

Abstract

The invention provides a method for operating and maintaining a big data cluster, which comprises the following steps: acquiring process information in a big data cluster to obtain process running information of each component in the big data cluster; setting an initial value of a process running information scanning time interval, and carrying out self-adaptive adjustment on the process running information scanning time interval according to the running information scanning condition; scanning whether a program error exists in each process of a tested assembly in the big data cluster by using the process running information; if the program error exists, extracting an error type corresponding to the program error, and carrying out error statistics; inquiring a corresponding repair strategy in a preset error code library according to the error type, and generating a repair instruction; and repairing the program error according to the repairing instruction and the repairing strategy. The system comprises modules corresponding to the method steps.

Description

Operation maintenance method and system for big data cluster

Technical Field

The invention provides a method and a system for operating and maintaining a big data cluster, and belongs to the technical field of operation and maintenance.

Background

Big data (big data), or huge data, refers to the data that is too large to be captured, managed, processed and organized in a reasonable time to help the enterprise to make business decisions more positive by the current mainstream software tools. Big data processing relies on a multitude of services like HDFS (distributed file system), YARN (resource management system), Spark (distributed memory computing framework), hbse (distributed column oriented database), HIVE (hadoop based data warehouse tool), etc. Due to network oscillation, unstable voltage, resource preemption, misoperation and other reasons, some components may be hung, maintenance personnel needs to periodically inspect the operation condition of the platform, the hung service is started after a program error is found to be abnormal and needs to be eliminated, if the hung service is not started in time, the overstocked service data may occur, even the operation of the service is influenced, and great challenges are brought to the stable operation of a large data platform. And because the large data platform has more use places and the error probability of a repetitive program is higher, operation and maintenance personnel need to do a large amount of repetitive labor. And some large data platforms do not allow remote operation due to the limitation of authority, so that great inconvenience is brought to routing inspection and program error repair of operation and maintenance personnel.

Disclosure of Invention

The invention provides a method and a system for operating and maintaining a big data cluster, which are used for solving the problems of low operation and maintenance efficiency and poor maintenance strength of the existing big data cluster, and adopt the following technical scheme:

a method for operation and maintenance of a big data cluster, the method comprising:

acquiring process information in a big data cluster to obtain process running information of each component in the big data cluster;

setting an initial value of a process running information scanning time interval, and carrying out self-adaptive adjustment on the process running information scanning time interval according to the running information scanning condition;

scanning whether a program error exists in each process of a tested assembly in the big data cluster by using the process running information; if the program error exists, extracting an error type corresponding to the program error, and carrying out error statistics;

inquiring a corresponding repair strategy in a preset error code library according to the error type, and generating a repair instruction;

and repairing the program error according to the repairing instruction and the repairing strategy.

Further, setting an initial value of a process running information scanning time interval, and performing adaptive adjustment on the process running information scanning time interval according to a running information scanning condition, wherein the adaptive adjustment comprises the following steps:

firstly, setting a scanning time interval initial value of process running information, and executing the scanning of each process of a tested component in a big data cluster by using the process running information according to the scanning time interval initial value;

secondly, process scanning of three continuous process running information scanning time intervals is carried out on the basis of the initial value of the process running information scanning time interval, namely three times of process scanning; after the three times of process scanning is finished, adjusting the process running information scanning time interval according to the time used by single scanning and the program error number in the process to obtain the process running information scanning time interval after self-adaptive adjustment;

thirdly, scanning each process of the tested assembly in the big data cluster by using the process running information according to the process running information scanning time interval after self-adaptive adjustment;

step four, after three times of scanning are continuously carried out according to the process running information scanning time interval after self-adaptive adjustment, the process running information scanning time interval is adjusted according to the time used by single scanning and the program error quantity in the process, and the process running information scanning time interval after self-adaptive adjustment is obtained again; scanning each process of the tested assembly in the big data cluster by using the process running information according to the process running information scanning time interval after the self-adaptive adjustment again;

and fifthly, repeating the contents from the third step to the fourth step, continuously adjusting the process running information scanning time interval, and scanning each process of the tested assembly in the big data cluster by using the continuously adjusted process running information scanning time interval.

Further, the process running information scanning time interval is adaptively adjusted according to the following formula:

wherein, T _i+1 Represents the information scanning time interval after the i +1 th adaptive adjustment, i is 1,2,3 … … n, n represents the total number of times of the adaptive adjustment of the information scanning time interval, and T is 1 ₁ Representing the initial value of the scanning time interval of the process running information; n represents the number of scanning processes in three consecutive scans; nc represents three consecutive timesIn the scanning, the obtained program error number; t is _i Indicating an information scanning time interval after the ith adaptive adjustment; t is _max Represents the maximum value of the time used for carrying out the process scanning in a single time in three times of scanning; t is _min The minimum amount of time it takes to perform a course scan in a single of the three scans.

Further, scanning whether a program error exists in each process of the tested assembly in the big data cluster by using the process running information; if the program error exists, extracting an error type corresponding to the program error, and performing error statistics, wherein the error statistics comprises the following steps:

when detecting that the process of the tested component has a program error, locking an error log corresponding to a program error trigger point according to the program error;

determining the error type according to the error log;

carrying out primary error marking on the process with the program error, and classifying the error type of the program error of the process;

counting the error marking times of the process of the tested component and various error types of the process to obtain a statistical result;

and sending the statistical result to an operation maintenance terminal of the big data cluster for recording.

Further, sending the statistical result to an operation maintenance terminal of the big data cluster for recording, including:

after receiving the statistical result, the operation maintenance terminal compares the statistical result with an error threshold corresponding to each tested component preset in the operation maintenance terminal:

and when any one of the error marking times and the error types in the statistical results of the tested components exceeds the error marking time index and the error type number index in the error threshold, the operation maintenance terminal carries out alarm prompt.

An operation maintenance system for large data clusters, the system comprising:

the acquisition module is used for acquiring process information in the big data cluster to obtain process running information of each component in the big data cluster;

the setting module is used for setting an initial value of a process running information scanning time interval and carrying out self-adaptive adjustment on the process running information scanning time interval according to the condition of carrying out running information scanning;

the judging module is used for scanning whether a program error exists in each process of the tested assembly in the big data cluster by using the process running information; if the program error exists, extracting an error type corresponding to the program error, and carrying out error statistics;

the generating module is used for inquiring a corresponding repairing strategy in a preset error code library according to the error type and generating a repairing instruction;

and the repairing module is used for repairing the program error according to the repairing instruction and the repairing strategy.

Further, the setting module includes:

the system comprises an initial value setting module, a scanning module and a processing module, wherein the initial value setting module is used for setting a scanning time interval initial value of process running information and executing the scanning of each process of a tested assembly in a big data cluster by using the process running information according to the scanning time interval initial value;

the scanning module I is used for carrying out process scanning of three continuous process running information scanning time intervals on the basis of the initial value of the process running information scanning time interval, namely three times of process scanning; after the three times of process scanning is finished, adjusting the process running information scanning time interval according to the time used by single scanning and the program error number in the process to obtain the process running information scanning time interval after self-adaptive adjustment; the system comprises a process running information scanning module, a data acquisition module, a data analysis module and a data analysis module, wherein the process running information scanning module is used for scanning each process of a tested assembly in a big data cluster according to a process running information scanning time interval after self-adaptive adjustment;

the self-adaptive adjustment module is used for adjusting the process running information scanning time interval according to the time used by single scanning and the program error quantity in the process after three times of scanning is continuously carried out at the process running information scanning time interval after self-adaptive adjustment, so as to obtain the process running information scanning time interval after self-adaptive adjustment again; scanning each process of the tested assembly in the big data cluster by using the process running information according to the process running information scanning time interval after the self-adaptive adjustment again; and continuously adjusting the process running information scanning time interval, and scanning each process of the tested assembly in the big data cluster by using the continuously adjusted process running information scanning time interval.

wherein, T _i+1 Represents the information scanning time interval after the i +1 th adaptive adjustment, i is 1,2,3 … … n, n represents the total number of times of the adaptive adjustment of the information scanning time interval, and T is 1 ₁ Representing the initial value of the scanning time interval of the process running information; n represents the number of scanning processes in three consecutive scans; nc represents the number of program errors obtained in three consecutive scans; t is _i Indicating an information scanning time interval after the ith adaptive adjustment; t is _max Represents the maximum value of the time used for carrying out the process scanning in a single time in three times of scanning; t is _min The minimum amount of time it takes to perform a course scan in a single of the three scans.

Further, the judging module comprises:

the locking module is used for locking an error log corresponding to the program error trigger point according to the program error when the program error of the process of the tested assembly is detected;

the type determining module is used for determining the error type according to the error log;

the marking module is used for marking the process with the program error once and classifying the error type of the program error of the process;

the statistical module is used for counting the error marking times of the process of the tested assembly and various error types of the process to obtain a statistical result;

and the recording module is used for sending the statistical result to an operation maintenance terminal of the big data cluster for recording.

Further, the recording module includes:

the comparison module is used for controlling the operation maintenance terminal to compare the statistical result with an error threshold corresponding to each tested component preset in the operation maintenance terminal after receiving the statistical result:

and the warning module is used for performing warning prompt by the operation maintenance terminal when any one of the error marking times and the error types in the statistical results of the tested components exceeds the error marking time index and the error type number index in the error threshold value.

The invention has the beneficial effects that:

the operation maintenance method and the system for the big data cluster can effectively improve the operation maintenance efficiency and the operation maintenance strength of data. The scanning of the data cluster and the matching degree of the data acquisition frequency and the actual operation condition of the big data cluster can be effectively improved through the setting of the process operation information scanning time interval and the self-adaptive adjustment, so that the data acquisition frequency in the operation and maintenance process can be adjusted at any time according to the actual operation condition of the big data cluster, and further the operation and maintenance process of the whole big data cluster can be adjusted according to the change of the actual operation condition of the big data cluster, the operation and maintenance efficiency and the operation and maintenance force in the operation and maintenance process of the big data cluster can be effectively improved, meanwhile, the operation and maintenance process of the whole big data cluster can be adjusted according to the change of the actual operation condition of the big data cluster, the reasonable adjustment and application of operation and maintenance resources can be realized, and the waste of the operation and maintenance resources can be reduced.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a block diagram of the system of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

As shown in fig. 1, the operation maintenance method for a big data cluster according to an embodiment of the present invention includes:

s1, collecting process information in the big data cluster to obtain process running information of each component in the big data cluster;

s2, setting an initial value of a process running information scanning time interval, and carrying out self-adaptive adjustment on the process running information scanning time interval according to the running information scanning condition;

s3, scanning whether a program error exists in each process of the tested assembly in the big data cluster by using the process running information; if the program error exists, extracting an error type corresponding to the program error, and carrying out error statistics;

s4, inquiring a corresponding repair strategy in a preset error code library according to the error type, and generating a repair instruction;

and S5, repairing the program error according to the repairing instruction and the repairing strategy.

The effect of the above technical scheme is as follows: the operation maintenance efficiency and the operation maintenance force of the data can be effectively improved. The scanning of the data cluster and the matching degree of the data acquisition frequency and the actual operation condition of the big data cluster can be effectively improved through the setting and the self-adaptive adjustment of the process operation information scanning time interval, the data acquisition frequency in the operation and maintenance process can be adjusted at any time according to the actual operation condition of the big data cluster, and then the operation and maintenance process of the whole big data cluster can be adjusted according to the change of the actual operation condition of the big data cluster, the operation and maintenance efficiency and the force of the operation and maintenance process of the big data cluster can be effectively improved, meanwhile, the operation and maintenance process of the whole big data cluster can be adjusted according to the change of the actual operation condition of the big data cluster, the reasonable adjustment and application of operation and maintenance resources can be realized, and the waste of the operation and maintenance resources can be reduced.

In an embodiment of the present invention, setting an initial value of a process running information scanning time interval, and performing adaptive adjustment on the process running information scanning time interval according to a condition of performing running information scanning, includes:

firstly, setting an initial value of a scanning time interval of process running information, and executing scanning of each process of a tested assembly in a big data cluster by using the process running information according to the initial value of the scanning time interval;

step four, after three times of scanning is continuously carried out at the process running information scanning time interval after self-adaptive adjustment, the process running information scanning time interval is adjusted according to the time used by single scanning and the program error quantity in the process, and the process running information scanning time interval after self-adaptive adjustment is obtained again; scanning each process of the tested assembly in the big data cluster by using the process running information according to the process running information scanning time interval after the self-adaptive adjustment again;

The process running information scanning time interval is adaptively adjusted according to the following formula:

The effect of the above technical scheme is as follows: the scanning of the data cluster and the matching degree of the data acquisition frequency and the actual operation condition of the big data cluster can be effectively improved through the setting and the self-adaptive adjustment of the process operation information scanning time interval, the data acquisition frequency in the operation and maintenance process can be adjusted at any time according to the actual operation condition of the big data cluster, and then the operation and maintenance process of the whole big data cluster can be adjusted according to the change of the actual operation condition of the big data cluster, the operation and maintenance efficiency and the force of the operation and maintenance process of the big data cluster can be effectively improved, in addition, the operation and maintenance process of the whole big data cluster can be adjusted according to the change of the actual operation condition of the big data cluster, the reasonable adjustment and application of operation and maintenance resources can be realized, and the waste of the operation and maintenance resources can be reduced.

Meanwhile, in the operation process of each tested device of the large data cluster, the number of processes is different in each time period due to the difference between the number of executed tasks and the data amount of a single task, and further the scanning time for the processes is different, so that the change of the scanning time indirectly reflects the task amount of each tested device in the operation process of executing the tasks; the self-adaptive adjustment of the process running information scanning time interval is carried out through the formula, so that the matching degree of the process running information scanning time interval and the actual conditions of the large data cluster running and the program error rate can be improved to a great extent. And highly matching the whole operation and maintenance operation condition with the actual operation condition of each tested device of the big data cluster.

In one embodiment of the present invention, the process running information is used to scan whether a program error exists in each process of the tested component in the big data cluster; if the program error exists, extracting an error type corresponding to the program error, and performing error statistics, wherein the error statistics comprises the following steps:

s301, when detecting that a program error occurs in the process of the tested component, locking an error log corresponding to a program error trigger point according to the program error;

s302, determining the error type according to the error log;

s303, carrying out primary error marking on the process with the program error, and classifying the error type of the program error of the process;

s304, counting the error marking times of the process of the tested assembly and various error types of the process to obtain a statistical result;

s305, sending the statistical result to an operation maintenance terminal of the big data cluster for recording.

Sending the statistical result to an operation maintenance terminal of the big data cluster for recording, wherein the recording comprises the following steps:

s3051, after receiving the statistical result, the operation and maintenance terminal compares the statistical result with an error threshold corresponding to each tested component preset in the operation and maintenance terminal: the error threshold comprises an error marking frequency index and an error type number index;

s3052, when any one of the error marking times and the error types in the statistical results of the tested components exceeds the error marking time index and the error type number index in the error threshold, the operation maintenance terminal carries out alarm prompt.

The effect of the above technical scheme is as follows: through error record statistics, comparison between the statistical result and the error threshold value and warning, the operation maintenance efficiency and the operation maintenance strength of the data can be effectively improved.

An embodiment of the present invention provides an operation maintenance system for a big data cluster, as shown in fig. 2, the system includes:

the system comprises a setting module, a processing module and a processing module, wherein the setting module is used for setting an initial value of a process running information scanning time interval and carrying out self-adaptive adjustment on the process running information scanning time interval according to the condition of carrying out running information scanning;

The working principle of the technical scheme is as follows: firstly, acquiring process information in a big data cluster through an acquisition module to obtain process running information of each component in the big data cluster; then, setting an initial value of a process running information scanning time interval by using a setting module, and carrying out self-adaptive adjustment on the process running information scanning time interval according to the condition of carrying out running information scanning; then, scanning whether a program error exists in each process of the tested assembly in the big data cluster by using the process running information through a judging module; if the program error exists, extracting an error type corresponding to the program error, and carrying out error statistics; then, a generating module is adopted to query a corresponding repairing strategy in a preset error code library according to the error type and generate a repairing instruction; and finally, repairing the program error through a repairing module according to the repairing instruction and the repairing strategy.

In one embodiment of the present invention, the setting module includes:

the initial value setting module is used for setting an initial value of a scanning time interval of the process running information and executing the scanning of each process of the tested assembly in the big data cluster by using the process running information according to the initial value of the scanning time interval;

the scanning module I is used for carrying out process scanning of three continuous process running information scanning time intervals on the basis of the initial value of the process running information scanning time interval, namely three times of process scanning; after the three times of process scanning is finished, adjusting the process running information scanning time interval according to the time used by single scanning and the program error number in the process to obtain the process running information scanning time interval after self-adaptive adjustment; the system comprises a process running information scanning module, a process monitoring module and a data processing module, wherein the process running information scanning module is used for scanning each process of a tested assembly in a big data cluster according to a process running information scanning time interval after self-adaptive adjustment;

the self-adaptive adjustment module is used for adjusting the process running information scanning time interval according to the time used by single scanning and the program error quantity in the process after three times of scanning is continuously carried out at the process running information scanning time interval after self-adaptive adjustment, so as to obtain the process running information scanning time interval after self-adaptive adjustment again; according to the process running information scanning time interval after the self-adaptive adjustment again, the process running information is utilized to execute the scanning of each process of the tested component in the big data cluster; and continuously adjusting the process running information scanning time interval, and scanning each process of the tested assembly in the big data cluster by using the continuously adjusted process running information scanning time interval.

wherein, T _i+1 When the number of times of the information scanning is 1,2,3 … … n, n is 1, T is 1 ₁ Representing the initial value of the scanning time interval of the process running information; n represents the number of scanning processes in three consecutive scans; nc represents the number of program errors obtained in three consecutive scans; t is _i Indicating an information scanning time interval after the ith adaptive adjustment; t is _max Represents the maximum value of the time used for carrying out the process scanning in a single time in three times of scanning; t is a unit of _min The minimum amount of time it takes to perform a course scan in a single of the three scans.

The working principle of the technical scheme is as follows: firstly, an initial value setting module sets a scanning time interval initial value of process running information, and the process running information is utilized to execute scanning of each process of a tested assembly in a big data cluster according to the scanning time interval initial value; then, process scanning of three continuous process running information scanning time intervals is carried out by using a scanning module I on the basis of the initial value of the process running information scanning time interval, namely three times of process scanning; after the three times of process scanning is finished, adjusting the process running information scanning time interval according to the time used by single scanning and the program error number in the process to obtain the process running information scanning time interval after self-adaptive adjustment; the system comprises a process running information scanning module, a data acquisition module, a data analysis module and a data analysis module, wherein the process running information scanning module is used for scanning each process of a tested assembly in a big data cluster according to a process running information scanning time interval after self-adaptive adjustment; finally, after three times of scanning are continuously carried out at the adaptively adjusted process running information scanning time interval through the adaptive adjustment module, the process running information scanning time interval is adjusted according to the time used by single scanning and the program error number in the process, and the process running information scanning time interval after adaptive adjustment is obtained again; scanning each process of the tested assembly in the big data cluster by using the process running information according to the process running information scanning time interval after the self-adaptive adjustment again; and continuously adjusting the process running information scanning time interval, and scanning each process of the tested assembly in the big data cluster by using the continuously adjusted process running information scanning time interval.

Meanwhile, in the running process of each tested device of the big data cluster, the quantity of processes is different in each time period due to the difference between the quantity of executed tasks and the data quantity of a single task, and the scanning time for the processes is different, so that the change of the scanning time indirectly reflects the quantity of the tasks executed by each tested device in the running process; the self-adaptive adjustment of the process running information scanning time interval is carried out through the formula, so that the matching degree of the process running information scanning time interval and the actual conditions of the large data cluster running and the program error rate can be improved to a great extent. The overall operation and maintenance operation condition is highly matched with the actual operation condition of each tested device of the big data cluster.

In an embodiment of the present invention, the determining module includes:

The recording module includes:

and the warning module is used for performing warning prompt on the operation maintenance terminal when any one of the error marking times and the error types in the statistical results of the tested components exceeds the error marking time index and the error type number index in the error threshold value.

The working principle of the technical scheme is as follows: firstly, when detecting that a program error occurs in the process of the tested component through a locking module, locking an error log corresponding to a program error trigger point according to the program error; then, determining the error type according to the error log through a type determination module; then, a marking module is used for marking the process with the program error once, and classifying the error type of the program error of the process; then, adopting a statistical module to perform statistics on the error marking times of the process of the tested component and various error types of the process to obtain a statistical result; and finally, sending the statistical result to an operation maintenance terminal of the big data cluster by adopting a recording module for recording.

The operation process of the recording module is as follows:

firstly, after the operation and maintenance terminal is controlled by a comparison module to receive the statistical result, the statistical result is compared with the error threshold corresponding to each tested component preset in the operation and maintenance terminal: and then, when any one of the error marking times and the error types in the statistical results of the tested components exceeds the error marking time index and the error type number index in the error threshold value by using the warning module, the operation maintenance terminal carries out warning prompt.

The effect of the above technical scheme is as follows: the effect of the above technical scheme is as follows: through error record statistics, comparison between the statistical result and the error threshold value and warning, the operation maintenance efficiency and the operation maintenance strength of the data can be effectively improved.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for operation and maintenance of a big data cluster, the method comprising:

setting an initial value of a process running information scanning time interval, and carrying out self-adaptive adjustment on the process running information scanning time interval according to the process running information scanning condition;

repairing the program error according to the repairing instruction and a repairing strategy;

the method comprises the following steps of setting an initial value of a process running information scanning time interval, and carrying out self-adaptive adjustment on the process running information scanning time interval according to the process running information scanning condition, wherein the method comprises the following steps:

thirdly, scanning each process of the tested component in the big data cluster by using the process running information according to the adaptively adjusted process running information scanning time interval;

fifthly, repeating the contents of the third step to the fourth step, continuously adjusting the process running information scanning time interval, and scanning each process of the tested assembly in the big data cluster by using the continuously adjusted process running information scanning time interval;

wherein, T _i+1 Represents the information scanning time interval after the i +1 th time of adaptive adjustment, i is 1,2,3 … … n, n represents the total number of times of adaptive adjustment of the information scanning time interval, and when i is 1, T ₁ Representing the initial value of the scanning time interval of the process running information; n represents the number of scanning processes in three consecutive scans; nc represents the number of program errors obtained in three consecutive scans; t is _i Indicating an information scanning time interval after the ith adaptive adjustment; t is _max Represents the maximum value of the time used for carrying out the process scanning in a single time in three times of scanning; t is _min The minimum amount of time it takes to perform a course scan in a single of the three scans.

2. The method of claim 1, wherein the process running information is used to scan whether a program error exists in each process of the tested components in the big data cluster; if the program error exists, extracting an error type corresponding to the program error, and performing error statistics, wherein the error statistics comprises the following steps:

when the program error of the process of the tested component is detected, locking an error log corresponding to a program error trigger point according to the program error;

determining the error type according to the error log;

3. The method of claim 2, wherein sending the statistical result to an operation and maintenance terminal of the big data cluster for recording comprises:

and when any one of the error marking times and the error types in the statistical results of the tested components exceeds the error marking time index and the error type number index in the error threshold value, the operation maintenance terminal gives an alarm prompt.

4. An operation and maintenance system for large data clusters, the system comprising:

the setting module is used for setting an initial value of a process running information scanning time interval and carrying out self-adaptive adjustment on the process running information scanning time interval according to the process running information scanning condition;

the repair module is used for repairing the program error according to the repair instruction and the repair strategy;

wherein the setting module includes: the initial value setting module is used for setting an initial value of a scanning time interval of the process running information and executing the scanning of each process of the tested assembly in the big data cluster by using the process running information according to the initial value of the scanning time interval;

the self-adaptive adjustment module is used for adjusting the process running information scanning time interval according to the time used by single scanning and the program error quantity in the process after three times of scanning is continuously carried out at the process running information scanning time interval after self-adaptive adjustment, so as to obtain the process running information scanning time interval after self-adaptive adjustment again; scanning each process of the tested assembly in the big data cluster by using the process running information according to the process running information scanning time interval after the self-adaptive adjustment again; continuously adjusting the process running information scanning time interval, and scanning each process of the tested component in the big data cluster by using the continuously adjusted process running information scanning time interval;

wherein, T _i+1 Indicates the information scanning time interval after the (i + 1) th adaptive adjustmentWhere i is 1,2,3 … … n, n indicates the total number of times of information scanning interval adaptation, and T is 1 ₁ Representing the initial value of the scanning time interval of the process running information; n represents the number of scanning processes in three consecutive scans; nc represents the number of program errors obtained in three consecutive scans; t is _i Indicating an information scanning time interval after the ith adaptive adjustment; t is _max Represents the maximum value of the time used for carrying out the process scanning in a single time in three times of scanning; t is _min The minimum amount of time it takes to perform a course scan in a single of the three scans.

5. The system of claim 4, wherein the determining module comprises:

the locking module is used for locking an error log corresponding to a program error trigger point according to the program error when the program error of the process of the tested component is detected;

6. The system of claim 5, wherein the recording module comprises: