JP5925202B2

JP5925202B2 - Method for quantifying and analyzing parallel processing of algorithms

Info

Publication number: JP5925202B2
Application number: JP2013518789A
Authority: JP
Inventors: グウォジウンクリスリー; ヘユァンリン
Original assignee: ナショナル・チェン・クン・ユニヴァーシティ
Priority date: 2010-07-06
Filing date: 2011-07-05
Publication date: 2016-05-25
Anticipated expiration: 2031-07-05
Also published as: WO2012006285A1; EP2591414A1; EP2591414A4; JP2013530477A; KR20130038903A

Description

本発明は、アルゴリズムの並行処理を定量化及び分析する方法に関し、より詳しくは、アルゴリズムの本質的な並行処理を定量化及び分析する方法に関する。 The present invention relates to a method for quantifying and analyzing parallel processing of an algorithm, and more particularly to a method for quantifying and analyzing intrinsic parallel processing of an algorithm.

Ｇ．Ｍ．アムダールは、アルゴリズムの逐次実行部分の比率に従ったアルゴリズムの並列処理化の方法を紹介した。("大規模演算性能を達成するシングル・プロセッサによるアプローチの有効性、" ＡＦＩＰＳ会議の会議録、４８３から４８５ページ、１９６７）。 G. M.M. Amdahl introduced a method for parallel processing of algorithms according to the ratio of the sequential execution part of the algorithm. ("Effectiveness of a single processor approach to achieve large-scale computing performance," AFIPS conference proceedings, pages 483 to 485, 1967).

アムダールの方法の欠点は、この方法によって得られたアルゴリズムの並行処理の程度は、この方法を実行するターゲットとなるプラットフォームに依存し、アルゴリズム自体には必ずしも依存しないことである。従って、アムダールの方法を使って得られた並行処理の程度は、アルゴリズムに対して非本質的なであり、ターゲットとなるプラットフォームによるバイアスを受ける。 The disadvantage of Amdahl's method is that the degree of parallel processing of the algorithm obtained by this method depends on the target platform on which the method is executed and not necessarily on the algorithm itself. Thus, the degree of parallelism obtained using Amdahl's method is not intrinsic to the algorithm and is biased by the target platform.

Ａ．プリホジーらは、アルゴリズムのクリティカル・パス長と複雑性の比に基づくアルゴリズムの並列処理化可能性を評価する方法を提案した（"効率的なマルチメディア実装のための並列処理化可能性の評価：アルゴリズム・クリティカル・パスの動的評価、" 動画技術のための回路及びシステムについてのＩＥＥＥ論文誌、５９３から６０８ページ、１５巻、Ｎｏ．５、２００５年５月）。複雑性とは、アルゴリズムの演算の合計回数であり、クリティカル・パスとは、演算データ依存性により逐次実行しなければならない演算の最大回数である。この方法は、アルゴリズムに埋め込まれた並行処理の平均の程度を決めることができるかもしれないが、アルゴリズムに埋め込まれた様々なマルチグレイン並行処理を網羅的に決めるには不十分である。 A. Prehozy et al. Proposed a method for evaluating parallelism of an algorithm based on the ratio of the critical path length to the complexity of the algorithm ("Evaluation of parallelism for efficient multimedia implementation: Dynamic evaluation of algorithm critical path, "IEEE paper on circuits and systems for video technology, pages 593 to 608, volume 15, No. 5, May 2005). Complexity is the total number of operations of the algorithm, and critical path is the maximum number of operations that must be executed sequentially due to dependency on operation data. This method may be able to determine the average degree of parallel processing embedded in the algorithm, but is not sufficient to exhaustively determine the various multigrain parallel processing embedded in the algorithm.

従って、本発明の目的は、ターゲットとなるハードウェア及び／又はソフトウェアのプラットフォームによってバイアスを受けない、アルゴリズムの本質的な並行処理を定量化及び分析する方法を提供することにある。 Accordingly, it is an object of the present invention to provide a method for quantifying and analyzing the intrinsic parallelism of an algorithm that is not biased by the target hardware and / or software platform.

従って、本発明によるアルゴリズムの本質的な並行処理を定量化及び分析する方法は、コンピューターによって実装されるように構成され、以下のステップを有する：
ａ）コンピューターを、複数の演算セットによるアルゴリズムを表現するよう構成させる、
ｂ）コンピューターを、複数の演算セットに従いラプラス行列を取得するよう構成させる、
ｃ）コンピューターを、ラプラス行列の固有値及び固有ベクトルを計算するよう構成させる、
ｄ）コンピューターを、ラプラス行列の固有値及び固有ベクトルに従って、アルゴリズムの本質的な並行処理に関する情報のセットを取得させるよう構成させる。 Thus, the method for quantifying and analyzing the intrinsic parallelism of the algorithm according to the invention is configured to be implemented by a computer and comprises the following steps:
a) configure the computer to represent algorithms with multiple sets of operations,
b) configuring the computer to obtain a Laplace matrix according to multiple sets of operations;
c) configuring the computer to compute eigenvalues and eigenvectors of the Laplace matrix;
d) Configure the computer to obtain a set of information about the intrinsic parallelism of the algorithm according to the eigenvalues and eigenvectors of the Laplace matrix.

本発明の他の特徴や利点は、添付図面を参照しながら、下記の好適な実施例の詳細な記載において明らかになる。添付図面は、下記の通りである。 Other features and advantages of the present invention will become apparent in the following detailed description of the preferred embodiments with reference to the accompanying drawings. The attached drawings are as follows.

本発明のアルゴリズムの本質的な並行処理を定量化及び分析する方法の好適な実施例を示すフローチャートである。Fig. 6 is a flow chart illustrating a preferred embodiment of a method for quantifying and analyzing the intrinsic parallelism of the algorithm of the present invention. アルゴリズムの例に関するデータフロー情報を示す概略図である。FIG. 6 is a schematic diagram illustrating data flow information regarding an example algorithm. データフロー・グラフのセットの例を示す概略図である。It is the schematic which shows the example of the set of a dataflow graph. ４×４離散コサイン変換アルゴリズムの演算セットを示す概略図である。It is the schematic which shows the calculation set of a 4x4 discrete cosine transform algorithm. ６に等しい依存性深さに対応する本質的並列処理の構成の例を示す概略図である。6 is a schematic diagram illustrating an example of a configuration of intrinsic parallel processing corresponding to a dependency depth equal to 6. FIG. ５に等しい依存性深さに対応する本質的並列処理の構成の例を示す概略図である。6 is a schematic diagram illustrating an example of a configuration of intrinsic parallel processing corresponding to a dependency depth equal to 5. FIG. ３に等しい依存性深さに対応する本質的並列処理の構成の例を示す概略図である。3 is a schematic diagram illustrating an example of a configuration of intrinsic parallel processing corresponding to a dependency depth equal to 3. FIG.

図１を参照し、本発明によるアルゴリズムの本質的な並行処理を評価する方法の好適な実施例は、コンピューターによって実装されるように構成され、以下のステップを有する。 Referring to FIG. 1, a preferred embodiment of a method for evaluating the intrinsic parallelism of an algorithm according to the present invention is configured to be implemented by a computer and has the following steps.

本質的な並列処理の程度は、ソフトウェア及びハードウェアの設計及び構成を考慮せず、アルゴリズム自身の並列処理の程度を示すものである。つまり、本発明の方法は、アルゴリズムを分析する時、ソフトウェア及びハードウェアによって制限されない。 The essential degree of parallel processing indicates the degree of parallel processing of the algorithm itself without considering the design and configuration of software and hardware. That is, the method of the present invention is not limited by software and hardware when analyzing the algorithm.

ステップ１１において、コンピューターは、複数の演算セットによるアルゴリズムを表現するよう構成される。それぞれの演算セットは、等式、プログラムコード、フローチャート、その他、アルゴリズムを表すどのような形態であってもよい。下記の例においては、アルゴリズムは、下記のようにあらわされる、３つの演算セットＯ１、Ｏ２、及びＯ３を含む。
Ｏ１＝Ａ₁＋Ｂ₁＋Ｃ₁＋Ｄ₁、
Ｏ２＝Ａ₂＋Ｂ₂＋Ｃ₂、及び
Ｏ３＝Ａ₃＋Ｂ₃＋Ｃ₃ In step 11, the computer is configured to represent an algorithm with a plurality of operation sets. Each set of operations may take any form that represents an equation, program code, flowchart, or other algorithm. In the example below, the algorithm includes three operation sets O1, O2, and O3, represented as follows:
O1 = A ₁ + B ₁ + C ₁ + D ₁ ,
O2 = A ₂ + B ₂ + C ₂ , and
O3 = A ₃ + B ₃ + C ₃

ステップ１２は、演算セットに従ったラプラス行列Ｌ_dを取得するようコンピューターを構成することであり、下記のサブステップを含む。 Step 12 is to configure the computer to obtain a Laplace matrix L _d according to the operation set and includes the following sub-steps.

サブステップ１２１において、演算セットに従い、コンピューターは、アルゴリズムに関するデータフロー情報を取得するよう構成される。図２に示されるように、この例の演算セットに対応するデータフロー情報は、下記のように表現されてもよい。
データ１＝Ａ₁＋Ｂ₁
データ２＝Ａ₂＋Ｂ₂
データ３＝Ａ₃＋Ｂ₃
データ４＝データ１＋データ７
データ５＝データ２＋Ｃ₂
データ６＝データ３＋Ｃ₃
データ７＝Ｃ₁＋Ｄ₁ In sub-step 121, according to the operation set, the computer is configured to obtain data flow information regarding the algorithm. As shown in FIG. 2, the data flow information corresponding to the operation set of this example may be expressed as follows.
Data 1 = A ₁ + B ₁
Data 2 = A ₂ + B ₂
Data 3 = A ₃ + B ₃
Data 4 = Data 1 + Data 7
Data 5 = Data 2 + C ₂
Data 6 = Data 3 + C ₃
Data 7 = C ₁ + D ₁

サブステップ１２２において、コンピューターは、データフロー情報に従って、データフロー・グラフを取得するよう構成される。データフロー・グラフは、アルゴリズムにおいて演算を表す複数の頂点及び、頂点のうち対応する２つの頂点の相互接続を示しアルゴリズムにおけるデータのソースとデスティネーションを示す方向性のある複数の辺によって構成される。図２に示されるデータフロー情報については、演算記号Ｖ₁からＶ₇（つまり、頂点）が加算記号の代わりに使われ、矢印（つまり、方向性のある辺）が、データのソースとデスティネーションを示し、これにより、図３のデータフロー・グラフが取得される。特に、演算記号Ｖ₁が加算演算Ａ₁＋Ｂ₁を示し、演算記号Ｖ₂が加算演算Ａ₂＋Ｂ₂を示し、演算記号Ｖ₃が加算演算Ａ₃＋Ｂ₃を示し、演算記号Ｖ₄が加算演算データ１＋データ７を示し、演算記号Ｖ₅が加算演算データ２＋Ｃ₂を示し、演算記号Ｖ₆が加算演算データ３＋Ｃ₃を示し、そして演算記号Ｖ₇が加算演算Ｄ₁＋Ｃ₁を示す。 In sub-step 122, the computer is configured to obtain a data flow graph according to the data flow information. The data flow graph is composed of a plurality of vertices representing operations in the algorithm and a plurality of directional edges indicating the interconnection of two corresponding vertices of the vertices and indicating the source and destination of the data in the algorithm . For the data flow information shown in FIG. 2, operational symbols V ₁ through V ₇ (ie, vertices) are used instead of addition symbols, and arrows (ie, directional edges) indicate the data source and destination. As a result, the data flow graph of FIG. 3 is obtained. In particular, the operation symbol V ₁ indicates an addition operation A ₁ + B ₁ , the operation symbol V ₂ indicates an addition operation A ₂ + B ₂ , the operation symbol V ₃ indicates an addition operation A ₃ + B ₃ , and the operation symbol V ₄ adds. Operation data 1 + data 7 is shown, operation symbol V ₅ is addition operation data 2 + C ₂ , operation symbol V ₆ is addition operation data 3 + C ₃ , and operation symbol V ₇ is addition operation D ₁ + C ₁ .

図３に示されるデータフロー・グラフから、演算記号Ｖ₄が演算記号Ｖ₁及びＶ₇に依存することが分かる。同様に、演算記号Ｖ₅が演算記号Ｖ₂に依存し、演算記号Ｖ₆が演算記号Ｖ₃に依存し、演算記Ｖ₄、Ｖ₅及びＶ₆は、互いに依存しない。 From the data flow graph shown in FIG. 3, it can be seen that the operation symbol V ₄ depends on the operation symbols V ₁ and V ₇ . Similarly, the operation symbol V ₅ depends on the operation symbol V ₂ , the operation symbol V ₆ depends on the operation symbol V _3, and the operation symbols V ₄ , V _5, and V ₆ do not depend on each other.

サブステップ１２３において、コンピューターは、データフロー・グラフに従ったラプラス行列Ｌ_dを取得するように構成される。ラプラス行列Ｌ_dにおいて、ｉ番目の対角の要素は、演算記号Ｖ_iに接続される演算記号の数を示し、対角から外れた要素は、２つの演算記号が接続されているかどうかを示す。従って、ラプラス行列Ｌ_dは、明確にデータフロー・グラフを、線形代数形式によって示すことができる。図３に示されるデータフロー・グラフのセットは、下記のように示されてもよい。

In sub-step 123, the computer is configured to obtain a Laplace matrix L _d according to the data flow graph. In the Laplace matrix L _d , the i-th diagonal element indicates the number of arithmetic symbols connected to the arithmetic symbol V _i, and the element outside the diagonal indicates whether two arithmetic symbols are connected. . Therefore, the Laplace matrix L _d can clearly show a data flow graph in linear algebra form. The set of data flow graphs shown in FIG. 3 may be shown as follows:

ラプラス行列Ｌ_dは、演算記号Ｖ₁からＶ₇の間の接続性を示し、第１列から第７列は、演算記号Ｖ₁からＶ₇をそれぞれ示す。例えば、最初の列において、演算記号Ｖ₁は、演算記号Ｖ₄に接続され、従って、行列要素(１，４)は−１である。 The Laplace matrix L _d indicates connectivity between the operation symbols V ₁ to V ₇ , and the first to seventh columns indicate the operation symbols V ₁ to V ₇ , respectively. For example, in the first column, the operation symbol V ₁ is connected to the operation symbol V ₄ , and thus the matrix element (1, 4) is −1.

ステップ１３において、コンピューターは、ラプラス行列Ｌ_dの固有値λ及び固有ベクトルＸ_dを計算するように構成される。上記例において取得されたラプラス行列Ｌ_dについて、固有値λ及び固有ベクトルＸ_dは、下記の通りである。

In step 13, the computer is configured to calculate the eigenvalue λ and eigenvector X _d of the Laplace matrix L _d . For the Laplace matrix L _d acquired in the above example, the eigenvalue λ and the eigenvector X _d are as follows.

ステップ１４において、コンピューターは、ラプラス行列Ｌ_dの固有値λ及び固有ベクトルＸ_dに従って、アルゴリズムの本質的な並列処理に関する情報のセットを取得するように構成される。上記本質的な並列処理に関する情報のセットは、演算セットのうち、互いに独立しており並列処理に実行可能な独立な演算セットを認識するように厳密に定義される。厳密な並列処理に関する情報のセットは、アルゴリズムの上記演算セットのうち、独立な演算セットの数を示す厳密な並列処理の程度、及び、演算セットに対応する厳密な並列処理の構成のセットをそれぞれ含む。Ｆ．Ｒ．Ｋ．チャン（数学のリジョナル・カンファレンス・シリーズ、Ｎｏ．９２、１９９７）によって紹介されたスペクトルグラフ理論に基づき、グラフの接続された構成の数は、０に等しいラプラス行列の固有値の数に等しい。アルゴリズムに埋め込まれた厳密な並列処理の程度は、従って、０に等しい固有値λの数に等しい。その上、スペクトルグラフ理論に基づき、厳密な並列処理の構成は、０に等しい固有値λに関連付けられる固有ベクトルＸ_dに従って、特定されてもよい。 In step 14, the computer, in accordance with eigenvalue λ and eigenvectors X _d Laplace matrix L _d, configured to obtain a set of information about the intrinsic parallelism of the algorithm. The set of information regarding the essential parallel processing is strictly defined so as to recognize the independent operation sets that are independent from each other and can be executed in parallel processing. The set of information related to strict parallel processing includes the degree of strict parallel processing that indicates the number of independent operation sets, and the set of strict parallel processing configurations corresponding to the operation sets. Including. F. R. K. Based on the spectral graph theory introduced by Zhang (Mathematical Regional Conference Series, No. 92, 1997), the number of connected components of the graph is equal to the number of eigenvalues of the Laplace matrix equal to zero. The degree of strict parallel processing embedded in the algorithm is therefore equal to the number of eigenvalues λ equal to zero. Moreover, based on spectral graph theory, the exact parallel processing configuration may be specified according to the eigenvector X _d associated with the eigenvalue λ equal to zero.

上記例から、０に等しいラプラス固有値は３つ存在するので、データフロー・グラフのセットは、３つの独立な演算セットから構成されることが分かる。従って、例示されるアルゴリズムに埋め込まれる厳密な並列処理の程度は、３に等しい。次に、固有ベクトルＸ_dの第１、２、及び３の固有ベクトルは、０に等しい固有値λに関連付けられる。固有ベクトルＸ_dの第１の固有ベクトルを観察することにより、演算記号Ｖ₁、Ｖ₄及びＶ₇に対応する値が０ではないということが明らかであり、つまり、演算記号Ｖ₁、Ｖ₄及びＶ₇は依存し、データフロー・グラフの接続されたもの（Ｖ₁−Ｖ₄−Ｖ₇）を形成する。同様に、０に等しい固有値λに関連付けられた固有ベクトルＸ_dの第２及び３の固有ベクトルを形成し、演算記号Ｖ₂、Ｖ₅及び演算記号Ｖ₃、Ｖ₆は依存し、データフロー・グラフの残りの２つの接続されたもの（Ｖ₂−Ｖ₅及びＶ₃−Ｖ₆）をそれぞれ形成すると理解される。従って、コンピューターは、３に等しい厳密な並列処理の程度、及び、（図３に示されるような）グラフ、テーブル、等式、又はプログラムコードの形態で表現されうる厳密な並列処理の構成を取得するように構成される。 From the above example, it can be seen that since there are three Laplace eigenvalues equal to 0, the dataflow graph set is composed of three independent sets of operations. Therefore, the degree of strict parallel processing embedded in the illustrated algorithm is equal to 3. Next, the first, second, and third eigenvectors of eigenvector X _d are associated with an eigenvalue λ equal to zero. By observing the first eigenvector of the eigenvector X _d , it is clear that the values corresponding to the operation symbols V ₁ , V ₄ and V ₇ are not 0, that is, the operation symbols V ₁ , V ₄ and V ₇ depends and forms a connected (V ₁ -V ₄ -V ₇ ) of the data flow graph. Similarly, the second and third eigenvectors of the eigenvector X _d associated with the eigenvalue λ equal to 0 are formed, the operation symbols V ₂ , V ₅ and the operation symbols V ₃ , V ₆ are dependent, and the data flow graph the remaining two connected ones (V ₂ -V ₅ and V ₃ -V ₆₎ is understood to form respectively. Thus, the computer obtains a degree of strict parallel processing equal to 3 and a strict parallel processing configuration that can be expressed in the form of a graph, table, equation, or program code (as shown in FIG. 3). Configured to do.

ステップ１５において、コンピューターは、アルゴリズムの複数の依存性深さの少なくとも１つ及び厳密な並列処理に関する情報のセットに従ったアルゴリズムのマルチグレイン並列処理に関する情報の複数のセットを取得するように構成される。マルチグレイン並列処理に関する情報のセットは、独立な演算セットに埋め込まれた全ての可能性のある並列処理を特徴付けるアルゴリズムの広義の並列処理に関する情報のセット含む。 In step 15, the computer is configured to obtain a plurality of sets of information regarding multi-grain parallelism of the algorithm according to at least one of the plurality of dependency depths of the algorithm and a set of information regarding strict parallelism. The The set of information regarding multi-grain parallel processing includes a set of information regarding broad parallel processing of algorithms that characterize all possible parallel processing embedded in an independent set of operations.

尚、アルゴリズムの依存性深さは、アルゴリズムの処理において重要な、関連付けられた逐次ステップを表し、つまり、アルゴリズムの可能な並列処理に対して相互補完的である。従って、アルゴリズムの異なる本質的な並列処理に関する情報は、異なる依存性深さに基づいて取得されてもよい。特に、厳密な並列処理に関する情報は、アルゴリズムの依存性深さのうち最大のものに対応するアルゴリズムの本質的な並列処理に関する情報であり、広義の並列処理に関する情報は、依存性深さのうち最小のものに対応するアルゴリズムの本質的な並列処理に関する情報である。 Note that the depth of dependency of an algorithm represents an associated sequential step that is important in the processing of the algorithm, that is, complementary to the possible parallel processing of the algorithm. Thus, information regarding the intrinsic parallel processing of different algorithms may be obtained based on different dependency depths. In particular, information on strict parallel processing is information on the intrinsic parallel processing of the algorithm corresponding to the largest of the algorithm dependency depths, and information on parallel processing in a broad sense is information on the dependency depths. Information on the essential parallel processing of the algorithm corresponding to the smallest one.

例えば、上記アルゴリズムは、厳密な並列処理の２つの異なる構成、つまり、Ｖ₁−Ｖ₄−Ｖ₇及びＶ₂−Ｖ₅を含む（Ｖ₃−Ｖ₆は、Ｖ₂−Ｖ₅に似ており、同じ構成と考えられる）。厳密な並列処理Ｖ₁−Ｖ₄−Ｖ₇の構成については、演算記号Ｖ₁及びＶ₇が互いに独立である、つまり、演算記号Ｖ₁及びＶ₇が並列に処理されることができることが分かる。従って、アルゴリズムの広義の並列処理に関する情報のセットは、４に等しい広義の並列処理の程度を含み、広義の並列処理の構成は、厳密な並列処理の構成に類似する。 For example, the above algorithm includes two different configurations of strictly parallel processing: V ₁ -V ₄ -V ₇ and V ₂ -V ₅ (V ₃ -V ₆ is similar to V ₂ -V _5. And is considered the same configuration). The configuration of strict parallelism V ₁ -V ₄ -V _7, an independent operation symbol V ₁ and V ₇ are mutually, i.e., it can be seen that the operation symbol V ₁ and V ₇ can be processed in parallel . Therefore, the set of information regarding the parallel processing in the broad sense of the algorithm includes the degree of parallel processing in the broad sense equal to 4, and the configuration of the parallel processing in the broad sense is similar to the configuration of strict parallel processing.

この実施例の方法によれば、上記アルゴリズムの広義の並列処理の程度は、４に等しい。アルゴリズムを実装するのに、処理要素は７回の処理サイクルを要する。なぜならば、アルゴリズムは７個の演算記号Ｖ₁−Ｖ₇を含むからである。３に等しい厳密な並列処理の程度によれば、３個の処理要素を使ってアルゴリズムを実装するには、３回の処理サイクルを使う。４に等しい広義の並列処理の程度によれば、４個の処理要素を使ってアルゴリズムを実装するには、２回の処理サイクルを使う。更に、より多くの処理要素が使われるものの、少なくとも２回の処理サイクルがアルゴリズムを実装するのに必要であることが分かる。従って、アルゴリズムを実装するのに使われる処理要素の最適な数は、この実施例の方法に従って取得されうる。 According to the method of this embodiment, the degree of parallel processing in the broad sense of the above algorithm is equal to 4. The processing element requires seven processing cycles to implement the algorithm. This is because the algorithm includes seven operational symbols V ₁ -V ₇ . According to the degree of strict parallel processing equal to 3, to implement an algorithm using three processing elements, three processing cycles are used. According to the degree of parallel processing in a broad sense equal to 4, two processing cycles are used to implement an algorithm using four processing elements. Furthermore, although more processing elements are used, it can be seen that at least two processing cycles are required to implement the algorithm. Thus, the optimal number of processing elements used to implement the algorithm can be obtained according to the method of this embodiment.

４×４離散コサイン変換（ＤＣＴ）を例とすると、ＤＣＴアルゴリズムの演算セットは、図４に示されるデータフロー・グラフによって表現される。４×４ＤＣＴは当業者によってよく知られているので、その更に詳細については、簡潔さのためにここでは省略される。図４より、４×４ＤＣＴアルゴリズムの依存性深さの最大のものは、６に等しいことが分かる。依存性深さの最大のもの（つまり、６）については、このアルゴリズムの厳密な並列処理の構成は、図５に示されるように取得されてもよく、このアルゴリズムの厳密な並列処理の程度は、この実施例の方法によれば、４に等しい。５に等しい依存性深さのうち１つをもって４×４ＤＣＴアルゴリズムの本質的な並列処理を分析する時、このアルゴリズムの本質的な並列処理の構成は、図６に示されるように取得されてもよく、本質的な並列処理の程度は８に等しい。更に、３に等しい依存性深さのうち１つをもって４×４ＤＣＴアルゴリズムの本質的な並列処理を分析する時、このアルゴリズムの本質的な並列処理の構成は、図７に示されるように取得されてもよく、本質的な並列処理の程度は１６に等しい。 Taking a 4 × 4 discrete cosine transform (DCT) as an example, the operation set of the DCT algorithm is represented by the data flow graph shown in FIG. Since 4 × 4 DCT is well known by those skilled in the art, further details thereof are omitted here for the sake of brevity. FIG. 4 shows that the maximum dependency depth of the 4 × 4 DCT algorithm is equal to 6. For the maximum dependency depth (ie 6), the exact parallelism configuration of this algorithm may be obtained as shown in FIG. According to the method of this embodiment, it is equal to 4. When analyzing the intrinsic parallelism of the 4 × 4 DCT algorithm with one of the dependency depths equal to 5, the intrinsic parallelism configuration of this algorithm can be obtained as shown in FIG. Well, the degree of intrinsic parallelism is equal to 8. Furthermore, when analyzing the intrinsic parallelism of the 4 × 4 DCT algorithm with one of the dependency depths equal to 3, the intrinsic parallelism configuration of this algorithm is obtained as shown in FIG. The degree of intrinsic parallel processing is equal to 16.

つまり、この発明による方法は、アルゴリズムの本質的な並列処理を評価するのに使われてもよい。 That is, the method according to the invention may be used to evaluate the intrinsic parallelism of the algorithm.

本発明は、最も実用的で好適な実施例と考えられるものに関連付けられて記載されたものの、この発明は、記載された実施例に限定されることなく、最も広い解釈の精神及び範囲に含まれる様々な配設を網羅するように意図され、そのような全ての変更や均等な配設を含む、と考えられなければならない。 Although the present invention has been described in connection with what are considered to be the most practical and preferred embodiments, the invention is not limited to the described embodiments and is included in the spirit and scope of the broadest interpretation. It is intended to cover various arrangements that may be considered, and should be considered to include all such modifications and equivalent arrangements.

Claims

A method for quantifying and analyzing the intrinsic parallelism of an algorithm, said method being configured to be implemented by a computer, said method comprising:
a) configuring the computer to represent the algorithm with a plurality of sets of operations;
b) configuring the computer to obtain a Laplace matrix according to the plurality of operation sets;
c) configuring the computer to calculate eigenvalues and eigenvectors of the Laplace matrix; and d) setting the computer with information about intrinsic parallel processing of the algorithm according to the eigenvalues and eigenvectors of the Laplace matrix. viewing including the step, which is configured so as to get a,
Step d)
d1) causing the computer to be configured to obtain a set of information regarding exact parallel processing of the algorithm according to the eigenvalues and the eigenvectors of the Laplace matrix; and
d2) Substep of configuring the computer to obtain a set of information regarding multi-grain parallel processing of the algorithm according to at least one of the plurality of dependency depths of the algorithm and the set of information regarding the strict parallel processing. ,
Said method comprising including Mukoto a.

Step b)
b1) Substep for configuring the computer to obtain data flow information regarding the algorithm according to a set of operations;
b2) The computer is configured to acquire a dataflow graph composed of a plurality of vertices indicating operations in the algorithm according to the dataflow information, and indicates an interconnection of two corresponding vertices among the vertices. A sub-step configured to obtain a plurality of directional edges indicating the source and destination of the data, and b3) a sub-step configured to obtain the Laplace matrix according to the data flow graph ,
The method of claim 1 comprising:

The strict parallel processing information set includes a strict parallel processing degree representing the number of independent operation sets of the algorithm, and a strict parallel processing configuration set corresponding to the operation set, respectively. The method of claim 1 .

In sub-step d2), the computer is configured to respectively obtain a plurality of sets of information regarding multi-grain parallel processing of the algorithm according to the dependency depth and the set of information regarding the strict parallel processing. The method according to claim 1 .

Each set of the plurality of information regarding the multigrain parallel processing of the algorithm, the degree of multigrain parallel processing, and a method according to claim 4, characterized in that it comprises a set of configuration of a multi-grain parallel processing.

The set of information regarding multi-grain parallel processing includes a set of information regarding parallel processing of the algorithm in a broad sense obtained according to the minimum of the dependency depth and the set of information regarding strict parallel processing. The method according to claim 1 .

The set of information regarding the parallel processing in the broad sense includes a degree of parallel processing in the broad sense characterizing all possible parallel processing embedded in the independent operation set of the algorithm, and a set of configurations of the broad sense of parallel processing. The method of claim 6 , comprising:

In sub-step d1), the extent of the strict parallelism The method of claim 1, based on the spectral graph theory, characterized in that equal to the number of the eigenvalues equal to zero.

The method of claim 1 , wherein the information regarding multi-grain parallel processing includes a degree of multi-grain parallel processing and a set of multi-grain parallel processing configurations.

Computer chromatography with machine-readable storage medium storing program instructions for causing the computer during performing a method for quantifying and analyzing the essential parallelism of the algorithm according to claim 1.