[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112540900B - Real-time monitoring and analyzing method for large-scale parallel program - Google Patents

Real-time monitoring and analyzing method for large-scale parallel program Download PDF

Info

Publication number
CN112540900B
CN112540900B CN201910892876.4A CN201910892876A CN112540900B CN 112540900 B CN112540900 B CN 112540900B CN 201910892876 A CN201910892876 A CN 201910892876A CN 112540900 B CN112540900 B CN 112540900B
Authority
CN
China
Prior art keywords
indexes
program
application program
judging
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910892876.4A
Other languages
Chinese (zh)
Other versions
CN112540900A (en
Inventor
冯赟龙
刘勇
何王全
陈华蓉
宋佳伟
王敬宇
彭达佳
孙川
罗威
张威
梁艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910892876.4A priority Critical patent/CN112540900B/en
Publication of CN112540900A publication Critical patent/CN112540900A/en
Application granted granted Critical
Publication of CN112540900B publication Critical patent/CN112540900B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a real-time monitoring and analyzing method for a large-scale parallel program, which comprises the following steps of S1: selecting m performance indexes capable of reflecting the program running state; s2: collecting selected running state index data; s3: combining the index data acquired by the same process in the S2 for n adjacent times into a longitudinal vector, and calculating the cosine similarity of the same index between different processes; s4: calculating other indexes of the problem process according to the step S3, if the process is judged to be the problem process according to calculated values obtained by all the remaining indexes, judging the problem process to be an abnormal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging the problem process to be a suspicious process; s5: and outputting the normal process, the suspicious process and the abnormal process obtained in the S3 and the S4 to a display screen. The invention can reduce the overhead and interference on the application program while realizing the monitoring and analysis of the parallel application program.

Description

Real-time monitoring and analyzing method for large-scale parallel program
Technical Field
The invention belongs to the technical field of computer parallel program optimization, and particularly relates to a real-time monitoring and analyzing method for a large-scale parallel program.
Background
Heterogeneous many-core processors have more complex hardware architectures, increasing the difficulty of debugging and tuning when developing application programs, and many applications have potential errors and performance risks. Most high-performance computing application software is large in operation scale and long in operation time, errors such as program deadlock and hanging and the like or serious performance problems are prone to occurring in the program operation process, the correct operation and the high efficiency of operation of the application software are seriously influenced, and the application output of a high-performance computing system and the progress of related scientific research projects are influenced.
In the traditional parallel program monitoring software and system, data is generally acquired by adopting an insertion or sampling method for monitoring the performance of the parallel program, and analysis is performed after the program is operated, such as gprofs; monitoring the running state of a program usually acquires current execution stack information of the program to assist problems of deadlock, abnormality and the like of the debugging program, and the typical representative is STAT software; when a common data acquisition method based on instrumentation, sampling and the like is applied to a heterogeneous many-core system, software instrumentation codes introduced by the instrumentation method generate interference on codes of an application program and bring extra operation cost, a sampling program introduced by the sampling method generates competition with the application program of a large amount of hardware resources, although the method based on a hardware performance counter can reduce software work during partial performance data acquisition through support of hardware and can reduce cost, system calling still brings about not little cost and interference when the method is accessed based on system calling.
Therefore, there is a need in the art to develop a monitoring and analyzing method for heterogeneous many-core processors to solve the problems of overhead and interference in the detection of massively parallel applications.
Disclosure of Invention
The invention aims to provide a real-time monitoring and analyzing method for a large-scale parallel program, which solves the problems of high cost and high interference in the monitoring and analyzing process of the large-scale parallel application program.
In order to achieve the purpose, the invention adopts the technical scheme that: a real-time monitoring and analyzing method for a large-scale parallel program is based on a heterogeneous many-core processor and comprises the following steps:
s1: selecting m performance indexes capable of reflecting the program running state according to the support of a hardware performance counter;
s2: collecting the selected performance index data, the collecting method comprising the steps of:
a1: acquiring computing node list information used by each process of the parallel application program through a job management system;
a2: dividing tasks according to a computing node list which can be processed by a hardware maintenance interface at the same time, and establishing a task group and a task group queue;
a3: each subprocess of the parallel application program obtains a task group from the task queue of A2 and distributes the task group to a plurality of threads for execution;
a4: calling a hardware maintenance interface by a thread, and reading data in a hardware performance counter on a physical computing node;
s31: before analyzing the parallel application program, selecting the same parallel application program with normal performance indexes, forming index data acquired by N adjacent times of the same process into a longitudinal vector, calculating the cosine similarity of the same index among different processes, and storing the cosine similarity calculated by different indexes as a threshold value;
s32: forming index data acquired by the same process in the S2 for n times in an adjacent mode into a longitudinal vector, calculating cosine similarity of the same index between different processes, wherein according to a threshold value set in the S31, if the cosine similarity of two different processes exceeds the threshold value, the two processes are a problem process pair, if one process and processes more than 2/3 of a parallel application program form the problem process pair, the process is judged to be a problem process, and if not, the process is judged to be an undetermined process;
s4: calculating other indexes of the problem process according to the step S32, if the process is judged to be the problem process according to calculated values obtained by all the remaining indexes, judging the problem process to be an abnormal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging the problem process to be a suspicious process;
calculating other indexes of the process to be determined according to the step S32, if the process is also judged to be a process to be determined according to calculated values obtained by all the remaining indexes, judging the process to be determined to be a normal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging the process to be determined to be a suspicious process;
s5: and outputting the normal process, the suspicious process and the abnormal process obtained in the S32 and the S4 to a display screen.
The technical scheme of further improvement in the technical scheme is as follows:
1. in the above scheme, in step A4, the hardware maintenance interface reads data of the hardware performance counters on the plurality of physical computing nodes at the same time.
2. In the above scheme, in step S4, the index data of the suspicious process is retained and fed back to the display screen.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
according to the real-time monitoring and analyzing method for the large-scale parallel program, data are acquired in a mode of combining the hardware maintenance interface and the hardware performance counter, so that the overhead generated by system calling is avoided, and meanwhile, the data acquisition efficiency is improved; meanwhile, based on the similar behaviors among the processes of the parallel application program, the number of indexes needing to be collected is reduced through cosine similarity analysis, so that the analysis method can further reduce the overhead and the interference.
Drawings
FIG. 1 is a schematic flow diagram of a real-time monitoring and analysis method for massively parallel programs;
FIG. 2 is a schematic diagram of longitudinal vector selection in the present invention;
FIG. 3 is a schematic diagram of the selection of transverse vectors in the present invention.
Detailed Description
The invention is further described below with reference to the following examples:
example (b): a real-time monitoring and analyzing method for a large-scale parallel program is based on a heterogeneous many-core processor and comprises the following steps of:
s1: selecting m performance indexes capable of reflecting the program running state according to the support of a hardware performance counter;
s2: collecting selected performance index data, such as the number of instructions executed in each processor cycle, the number of memory access instructions executed in each processor cycle, the idle processor cycle ratio, the instruction Cache miss rate and the like, wherein the collecting method comprises the following steps:
a1: acquiring computing node list information used by each process of the parallel application program through a job management system;
a2: dividing tasks according to a computing node list which can be processed by a hardware maintenance interface at the same time, and establishing a task group and a task group queue;
here, the computing nodes that the hardware maintenance interface can process simultaneously have a certain rule on physical layout;
a3: each subprocess of the parallel application program obtains a task group from the task queue of A2 and distributes the task group to a plurality of threads for execution;
a4: calling a hardware maintenance interface by a thread, and reading data in a hardware performance counter on a physical computing node;
the hardware maintenance interface reads data of hardware performance counters on a plurality of physical computing nodes at the same time;
s31: before analyzing the parallel application program, selecting the same parallel application program with normal performance indexes, forming index data acquired by N adjacent times of the same process into a longitudinal vector, calculating the cosine similarity of the same index among different processes, and storing the cosine similarity calculated by different indexes as a threshold value;
s32: forming index data acquired by the same process in the S2 for n times in an adjacent mode into a longitudinal vector, calculating cosine similarity of the same index between different processes, wherein according to a threshold value set in the S31, if the cosine similarity of two different processes exceeds the threshold value, the two processes are a problem process pair, if one process and processes more than 2/3 of a parallel application program form the problem process pair, the process is judged to be a problem process, and if not, the process is judged to be an undetermined process;
s4: calculating other indexes of the problem process according to the step S3, if the process is judged to be the problem process according to calculated values obtained by all the remaining indexes, judging that the problem process is an abnormal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging that the problem process is a suspicious process;
calculating other indexes of the process to be determined according to the step S32, if the process is also judged to be a process to be determined according to calculated values obtained by all the remaining indexes, judging the process to be determined to be a normal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging the process to be determined to be a suspicious process;
s5: and outputting the normal process, the suspicious process and the abnormal process obtained in the S32 and the S4 to a display screen.
Wherein, the core part is: a soft and hard cooperative data acquisition method based on a maintenance interface and an abnormal process detection method based on the cosine similarity analysis of a longitudinal state vector during operation;
1. a method for acquiring soft and hard cooperative data based on a maintenance interface, referring to fig. 1:
the hardware maintenance interface can realize batch collection and collection of hundreds of computing node data, but a plurality of tasks are required to be processed, wherein the collection cannot generate too large load to the hardware maintenance interface, only a few key indexes are selected for collection, and meanwhile, the concurrency of data collection and processing is improved through the technologies of task division, multiprocess, multithreading and the like;
task division: according to the design of a maintenance interface, dividing a process capable of realizing acquisition in one-time calling into one task;
and (4) multi-process: establishing a task queue and a task group, and taking one task group from the task queue by each process for execution;
multithreading: each process spawns multiple threads to execute tasks in the task group.
2. An abnormal process detection method based on cosine similarity analysis of a running longitudinal state vector, which is described in the accompanying drawing 2~3:
the subprocesses of the parallel program are generated for realizing acceleration of a certain task, and often have highly similar code behaviors, so that the detection of the abnormal process can be realized by analyzing the similarity degree of the running behaviors of the subprocesses of the parallel program.
The runtime behavior of the process is represented by constructing a runtime state vector, for the same process, the same index acquired at different times can construct a longitudinal vector, and for a plurality of indexes acquired at the same time, the transverse vector can be constructed, however, enough performance indexes need to be acquired for constructing the transverse vector, and the cost and the interference are relatively large, so that the method adopting the longitudinal vector is considered:
(1) M important program running state indexes are selected for monitoring;
(2) Forming a vector by using data acquired for adjacent n times, setting a similarity threshold, calculating cosine similarity of an index 1 among different processes, indicating a process pair exceeding the threshold, and if a process shows that the process is not similar to most (such as more than 2/3) of the processes, indicating that the process possibly has problems;
(3) And (3) calculating the conditions of other indexes according to the step (2), mutually verifying the conditions of all indexes, if the abnormal processes obtained by all indexes are consistent, giving the abnormal processes certainly, and if the abnormal processes are inconsistent, participating in analysis by a user.
In the figure, a, b, c, d and e in a process respectively represent an index of the process, and a plurality of data values collected by the index on a time line are shown.
When the real-time monitoring and analyzing method for the large-scale parallel program is adopted, the data are acquired in a mode of combining the hardware maintenance interface and the hardware performance counter, so that the overhead generated by system calling is avoided, and the data acquisition efficiency is improved; meanwhile, based on the similar behaviors among the processes of the parallel application program, the number of indexes to be collected is reduced through cosine similarity analysis, so that the analysis method can further reduce the overhead and the interference.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
a heterogeneous many-core processor: the processor adopts a master-slave heterogeneous structure and consists of a control core and an operation core.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (2)

1. A real-time monitoring and analyzing method for a large-scale parallel program is characterized by comprising the following steps based on a heterogeneous many-core processor:
s1: selecting m performance indexes capable of reflecting the program running state according to the support of a hardware performance counter;
s2: collecting the selected performance index data, wherein the collecting method comprises the following steps:
a1: acquiring computing node list information used by each process of the parallel application program through a job management system;
a2: dividing tasks according to a computing node list which can be processed by a hardware maintenance interface at the same time, and establishing a task group and a task group queue;
a3: each subprocess of the parallel application program obtains a task group from the task queue of A2 and distributes the task group to a plurality of threads for execution;
a4: calling a hardware maintenance interface by a thread, and reading data in a hardware performance counter on a physical computing node;
s31: before analyzing the parallel application program, selecting a same parallel application program with normal performance indexes, forming index data acquired by N adjacent times of the same process into a longitudinal vector, calculating cosine similarity of the same index among different processes, and storing the cosine similarity calculated by different indexes as a threshold value;
s32: forming index data acquired by the same process for n times in the S2 into a longitudinal vector, calculating the cosine similarity of the same index between different processes, wherein according to a threshold value set in the S31, if the cosine similarity of two different processes exceeds the threshold value, the two processes are a problem process pair, if one process and processes more than 2/3 of parallel application programs form the problem process pair, the process is judged to be a problem process, otherwise, the process is judged to be an undetermined process;
s4: calculating other indexes of the problem process according to the step S32, if the process is judged to be the problem process according to calculated values obtained by all the remaining indexes, judging the problem process to be an abnormal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging the problem process to be a suspicious process;
calculating other indexes of the process to be determined according to the step S32, if the process is also judged to be a process to be determined according to calculated values obtained by all the remaining indexes, judging the process to be determined to be a normal process, and if the calculation result of one or more indexes does not exceed a threshold value, judging the process to be determined to be a suspicious process;
s5: outputting the normal process, the suspicious process and the abnormal process obtained in the S32 and the S4 to a display screen;
in step A4, the hardware maintenance interface reads data of hardware performance counters on a plurality of physical compute nodes simultaneously.
2. The real-time monitoring and analyzing method for massively parallel programs according to claim 1, wherein: in step S4, the index data of the suspicious process is retained and fed back to the display screen.
CN201910892876.4A 2019-09-20 2019-09-20 Real-time monitoring and analyzing method for large-scale parallel program Active CN112540900B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910892876.4A CN112540900B (en) 2019-09-20 2019-09-20 Real-time monitoring and analyzing method for large-scale parallel program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910892876.4A CN112540900B (en) 2019-09-20 2019-09-20 Real-time monitoring and analyzing method for large-scale parallel program

Publications (2)

Publication Number Publication Date
CN112540900A CN112540900A (en) 2021-03-23
CN112540900B true CN112540900B (en) 2022-11-25

Family

ID=75012410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910892876.4A Active CN112540900B (en) 2019-09-20 2019-09-20 Real-time monitoring and analyzing method for large-scale parallel program

Country Status (1)

Country Link
CN (1) CN112540900B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080066061A1 (en) * 2006-07-27 2008-03-13 International Business Machines Corporation Method and Data Processing System for Solving Resource Conflicts in Assembler Programs
CN101650687A (en) * 2009-09-14 2010-02-17 清华大学 Large-scale parallel program property-predication realizing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080066061A1 (en) * 2006-07-27 2008-03-13 International Business Machines Corporation Method and Data Processing System for Solving Resource Conflicts in Assembler Programs
CN101650687A (en) * 2009-09-14 2010-02-17 清华大学 Large-scale parallel program property-predication realizing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Global States Monitoring in Execution Control of Parallel Programs》;Janusz Borkowski 等;《2008 International Symposium on Parallel and Distributed Computing》;20081231;全文 *
《并行程序性能分析仿真系统设计与实现》;张鹏 等;《实验室研究与探索》;20121031;全文 *

Also Published As

Publication number Publication date
CN112540900A (en) 2021-03-23

Similar Documents

Publication Publication Date Title
Sprunt Pentium 4 performance-monitoring features
US8201165B2 (en) Virtualizing the execution of homogeneous parallel systems on heterogeneous multiprocessor platforms
CN104169889A (en) Run-time instrumentation sampling in transactional-execution mode
CN106293881B (en) Performance monitor based on non-uniform I/O access framework and monitoring method thereof
CN1109976C (en) Monitoring timer system
Banerjee et al. Inductive-bias-driven reinforcement learning for efficient schedules in heterogeneous clusters
CN110134517A (en) A kind of parallel calculating method and device based on Formula Parsing
CN101561778A (en) Method for detecting task closed loop of multi-task operating system
Heil et al. Relational profiling: enabling thread-level parallelism in virtual machines
Akram et al. Approximate lock: Trading off accuracy for performance by skipping critical sections
CN112540900B (en) Real-time monitoring and analyzing method for large-scale parallel program
US11768754B2 (en) Parallel program scalability bottleneck detection method and computing device
Loghin et al. The energy efficiency of modern multicore systems
US7882337B2 (en) Method and system for efficient tentative tracing of software in multiprocessors
EP2630577B1 (en) Exception control in a multiprocessor system
CN107451038B (en) Hardware event acquisition method, processor and computing system
JP4066838B2 (en) Shared resource conflict detector and shared resource conflict detection method
US20110173420A1 (en) Processor resume unit
CN107153604B (en) PMU-based parallel program performance monitoring and analyzing method
US8612988B2 (en) Method for monitoring system resources and associated electronic device
CN109815104B (en) GPGPU program approximate analysis system and method based on soft error perception
Dimpsey et al. Performance prediction and tuning on a multiprocessor
Tallent et al. Identifying performance bottlenecks in work-stealing computations
CN118502964B (en) Tokamak new classical circumferential viscous torque CUDA simulation implementation method
Li et al. A non-intrusive, operating system independent spinlock profiler for embedded multicore systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant