JP6622938B1

JP6622938B1 - Correlation extraction method and correlation extraction program

Info

Publication number: JP6622938B1
Application number: JP2019053874A
Authority: JP
Inventors: 知弘米田; 健吉加藤; 翔太山根
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2019-12-18
Anticipated expiration: 2039-03-20
Also published as: JP2020154890A

Abstract

【課題】分析データが異なる種類のデータを含んでいる場合や、これら分析データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出する。【解決手段】相関性抽出プログラムは、分析データを構成する複数の変数のうち２変数の指定を受け付ける工程、これら２変数の散布図において分析データの重心を通る各直線を算出する工程、各直線からの偏差が閾値を超えない各データを抽出する工程、各データから各相関係数を算出する工程、単一変数または／および変数の組合せの各条件付き確率を算出する工程、各相関係数と各条件付き確率に基づき、単一変数または／および変数の組合せを表示部に表示する工程をコンピュータに実施させる。【選択図】図２The present invention automatically extracts a suitable correlation condition even when analysis data includes different types of data or when an analyst does not understand the contents of each variable constituting the analysis data. A correlation extraction program includes a step of receiving designation of two variables among a plurality of variables constituting analysis data, a step of calculating each straight line passing through the center of gravity of the analysis data in a scatter diagram of these two variables, Extracting each data whose deviation does not exceed the threshold, calculating each correlation coefficient from each data, calculating each conditional probability of a single variable or / and a combination of variables, each correlation coefficient And causing the computer to perform a step of displaying a single variable or / and a combination of variables on the display unit based on the and the respective conditional probabilities. [Selection] Figure 2

Description

本発明は、相関性抽出方法および相関性抽出プログラムに関する。 The present invention relates to a correlation extraction method and a correlation extraction program.

データ分析においては、目的変数に対して相関が認められる変数を抽出することが重要となる。現状は、多種多様なデータが混在しているため分析者が可視化などの手作業を行い、条件を指定して傾向を見ている。この手作業において、過去の経験則や統計的手法などが用いられる。手作業によるデータ分析において、コンピュータは相関係数を算出し、分析者がデータの傾向を確認している。しかし、上手く条件を指定していない場合、例えば、他の種類までデータに含まれている場合などは、上手く相関性を有する変数が抽出されないおそれがある。 In data analysis, it is important to extract variables that are correlated with objective variables. At present, since a wide variety of data is mixed, analysts perform manual operations such as visualization and specify conditions to see the trend. In this manual work, past rules of thumb and statistical methods are used. In manual data analysis, the computer calculates the correlation coefficient and the analyst confirms the data trend. However, when the conditions are not well specified, for example, when the data is included in other types, there is a possibility that variables having a good correlation may not be extracted.

そこで、分析者の負担を減らすために、データの相関性を自動算出する技術が開示されている。例えば特許文献１の解決手段には、「目的変数の異常値を除去する。目的変数と複数の説明変数の間の関連度を計算し、関連度の高い複数の説明変数を抽出し、それらの間の独立度を計算する。関連度および独立度に基づいて、目的変数に大きな影響を与える可能性の高い説明変数の複数の候補を選択する。累積寄与率に基づいて、説明変数の候補の中から目的変数に対する寄与率の高い説明変数を選択し、回帰式を計算して、目的変数の予測値を求める。目的変数の予測値と実測値との差分を新たな目的変数とし、かつこの差分を求める際に用いた説明変数を除いた残りの説明変数を新たな説明変数として、同様の処理を繰り返す。」と記載されている。 In order to reduce the burden on the analyst, a technique for automatically calculating data correlation has been disclosed. For example, the solving means of Patent Document 1 includes: “To remove an abnormal value of an objective variable. Calculate a degree of association between the objective variable and a plurality of explanatory variables, extract a plurality of explanatory variables having a high degree of association, Select multiple candidates for explanatory variables that are likely to have a significant impact on the objective variable based on relevance and independence, and select candidate candidates for explanatory variables based on cumulative contribution An explanatory variable with a high contribution rate to the objective variable is selected, and a regression equation is calculated to obtain a predicted value of the objective variable.The difference between the predicted value of the objective variable and the actual measurement value is set as a new objective variable, and this The same processing is repeated using the remaining explanatory variables excluding the explanatory variables used when obtaining the difference as new explanatory variables. "

特開２００７−３２９４１５号公報JP 2007-329415 A

特許文献１に記載されている発明は、一定の条件下で製造しているデータには有効である。しかし、様々なデータが混入している場合には、通常の現象による影響であるか、又はデータ混入による影響であるか判断がつかないため、適用が困難である。
例えば機械の使用時間と部品交換の回数を分析する場合に、異なる部品のデータが混入した状態では、データの特徴が埋もれてしまい、目的変数に大きな影響を与える説明変数を正しく抽出できない可能性がある。更に目的変数に大きな影響を与える説明変数を取り出すだけでは、データの中に隠された知見、例えば、説明変数がある範囲の場合に目的変数に対する影響が大きい等の条件を抽出することができない。
また、データを手作業で分析する場合、分析者がこれらデータを構成する各変数の内容を理解する必要があった。 The invention described in Patent Document 1 is effective for data manufactured under certain conditions. However, when various data are mixed, it is difficult to determine whether the influence is due to a normal phenomenon or whether the influence is due to data mixing.
For example, when analyzing the machine usage time and the number of parts replacement, if the data of different parts is mixed, the characteristics of the data will be buried, and it may not be possible to correctly extract explanatory variables that have a large effect on the objective variable. is there. Furthermore, by extracting only the explanatory variable that has a large influence on the objective variable, it is impossible to extract knowledge hidden in the data, for example, a condition that the influence on the objective variable is large when the explanatory variable is in a certain range.
In addition, when analyzing data manually, it is necessary for an analyst to understand the contents of each variable constituting the data.

そこで、本発明は、分析データが異なる種類のデータを含んでいる場合や、これら分析データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出することを課題とする。 Therefore, the present invention automatically extracts suitable correlation conditions even when the analysis data includes different types of data or when the analyst does not understand the contents of each variable constituting the analysis data. Is an issue.

前記した課題を解決するため、本発明の相関性抽出方法は、コンピュータが、分析データを構成する複数の変数のうち２変数の指定を受け付けるステップと、前記２変数の散布図において前記分析データの重心を通る各直線を算出するステップと、各前記直線からの偏差が閾値を超えない各データを抽出するステップと、各前記データから各相関係数を算出するステップと、抽出した各前記データから出現比率が所定値よりも大きい単一変数または／および変数の組合せを取り出すステップと、各前記相関係数と各前記出現比率に基づき、前記単一変数または／および前記変数の組合せを表示部に表示するステップと、を実施することを特徴とする。 In order to solve the above-described problem, the correlation extraction method of the present invention includes a step in which a computer receives designation of two variables among a plurality of variables constituting analysis data, and the analysis data in the scatter diagram of the two variables. Calculating each straight line passing through the center of gravity; extracting each data whose deviation from each straight line does not exceed a threshold; calculating each correlation coefficient from each data; and from each extracted data a step ratio of appearance retrieves allowed combination of large single variable or / and variables than the predetermined value, based on each said correlation coefficient and each said occurrence ratio, displaying a combination of said single variable or / and the variable portion And the step of displaying on the screen.

本発明の相関性抽出プログラムは、コンピュータに、分析データを構成する複数の変数のうち２変数の指定を受け付ける工程、前記２変数の散布図において前記分析データの重心を通る各直線を算出する工程、各前記直線からの偏差が閾値を超えない各データを抽出する工程、各前記データから相関係数を算出する工程、抽出した各前記データから出現比率が所定値よりも大きい単一変数または／および変数の組合せを取り出す工程、各前記相関係数と各前記出現比率に基づき、前記単一変数または／および前記変数の組合せを表示部に表示する工程、を実行させる。
その他の手段については、発明を実施するための形態のなかで説明する。 The correlation extraction program of the present invention is a step of accepting designation of two variables among a plurality of variables constituting analysis data in a computer, and a step of calculating each straight line passing through the center of gravity of the analysis data in the scatter diagram of the two variables. Extracting each data whose deviation from each straight line does not exceed a threshold; calculating a correlation coefficient from each data; a single variable having an appearance ratio larger than a predetermined value from each extracted data; and the step of taking out the allowed combination of variables, on the basis of each of said correlation coefficient and each of said occurrence percentage, the step of displaying the combination of said single variable or / and the variables on the display unit, thereby to execute.
Other means will be described in the embodiment for carrying out the invention.

本発明によれば、分析データが異なる種類のデータを含んでいる場合や、これら分析データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出することが可能となる。 According to the present invention, it is possible to automatically extract suitable correlation conditions even when the analysis data includes different types of data or even when the analyst does not understand the contents of each variable constituting the analysis data. Is possible.

相関性抽出方法を実行するコンピュータの構成図である。It is a block diagram of the computer which performs the correlation extraction method. 相関性抽出処理を示すフローチャートである。It is a flowchart which shows a correlation extraction process. 選択した２変数の散布図の重心を特定する動作を説明する図である。It is a figure explaining the operation | movement which specifies the gravity center of the scatter diagram of the selected 2 variables. 選択した２変数の散布図の重心を通る直線を引く動作を説明する図である。It is a figure explaining the operation | movement which draws the straight line which passes along the gravity center of the scatter diagram of the selected 2 variables. 直線との偏差が閾値を超えないデータを抽出する動作を説明する図である。It is a figure explaining the operation | movement which extracts the data from which the deviation with respect to a straight line does not exceed a threshold value. 抽出したデータから、条件を満たすものに絞り込む動作を説明する図である。It is a figure explaining the operation | movement which narrows down to what satisfy | fills conditions satisfy | filled from the extracted data. 条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャート（その１）である。It is the flowchart (the 1) which shows the operation | movement which extracts the single variable or / and the combination of variable which satisfy | fills conditions, and its conditional probability. 条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャート（その２）である。It is the flowchart (the 2) which shows the operation | movement which extracts the single variable or / and variable combination which satisfy | fills conditions, and its conditional probability. 変数Ａのヒストグラムである。It is a histogram of variable A. 変数Ａの範囲を拡げる動作を示す図である。It is a figure which shows the operation | movement which expands the range of the variable A. 変数Ａの最頻値により、この変数Ａの出現比率を決定する図である。It is a figure which determines the appearance ratio of this variable A by the mode value of the variable A. 変数Ｚの最頻値により、この変数Ｚの出現比率を決定する図である。It is a figure which determines the appearance ratio of this variable Z by the mode value of the variable Z. 相関性抽出のための初期設定画面である。It is an initial setting screen for correlation extraction. 相関性を抽出した結果を示す図である。It is a figure which shows the result of having extracted the correlation. クラスタ化した分析データの相関性抽出処理を示すフローチャートである。It is a flowchart which shows the correlation extraction process of the analysis data clustered. 選択した２変数の散布図を説明する図である。It is a figure explaining the scatter diagram of the selected 2 variables. 選択した２変数の散布図において分析データをクラスタ化し、各クラスタの重心を特定する動作を説明する図である。It is a figure explaining the operation | movement which clusters analysis data in the scatter diagram of the selected 2 variables, and specifies the gravity center of each cluster. 選択した２変数の散布図の各クラスタにおける直線を特定する動作を説明する図である。It is a figure explaining the operation | movement which pinpoints the straight line in each cluster of the scatter diagram of the selected 2 variables.

以降、本発明を実施するための形態を、各図を参照して詳細に説明する。
図１は、相関性抽出方法を実行するコンピュータの構成図である。
コンピュータ１は、ＣＰＵ（Central Processing Unit）１１と、ＲＯＭ（Read Only Memory）１２と、ＲＡＭ（Random Access Memory）１３と、記憶部１６とを備えている。このコンピュータ１は、後記する第１、第２の実施形態に共通するものである。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
FIG. 1 is a configuration diagram of a computer that executes the correlation extraction method.
The computer 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, and a storage unit 16. This computer 1 is common to the first and second embodiments described later.

ＣＰＵ１１は、ＲＯＭ１２やＲＡＭ１３や記憶部１６に格納されたプログラムを実行し、ＲＯＭ１２やＲＡＭ１３や記憶部１６に格納されたデータを処理するものである。
ＲＯＭ１２は、不揮発性メモリで構成されており、例えばＢＩＯＳ（Basic Input/Output System）を格納する。ＲＡＭ１３は、揮発性メモリで構成されており、プログラムが一時的に格納する変数等に用いられる。記憶部１６は、例えばハードディスクやＳＳＤ（Solid State Drive）などの大容量記憶装置で構成されており、内部に分析データ１６１と相関性抽出プログラム１６２を格納している。 The CPU 11 executes a program stored in the ROM 12, RAM 13, or storage unit 16, and processes data stored in the ROM 12, RAM 13, or storage unit 16.
The ROM 12 is configured by a nonvolatile memory, and stores, for example, a BIOS (Basic Input / Output System). The RAM 13 is composed of a volatile memory and is used as a variable or the like that is temporarily stored by the program. The storage unit 16 is configured by a large-capacity storage device such as a hard disk or an SSD (Solid State Drive), for example, and stores analysis data 161 and a correlation extraction program 162 therein.

コンピュータ１は更に、入力部１４と、表示部１５とを備えている。
入力部１４は、例えばキーボードやマウスなどであり、このコンピュータ１に各種情報を入力するために用いられる。
表示部１５は、例えば液晶ディスプレイなどであり、このコンピュータ１が処理結果などを表示するために用いられる。 The computer 1 further includes an input unit 14 and a display unit 15.
The input unit 14 is, for example, a keyboard or a mouse, and is used for inputting various information to the computer 1.
The display unit 15 is, for example, a liquid crystal display, and is used by the computer 1 to display processing results.

《第１の実施形態》
以下、図２から図１１により、第１の実施形態の相関性抽出プログラム１６２について説明する。この相関性抽出プログラム１６２によれば、分析データ１６１が異なる種類のデータを含んでいる場合や、これら分析データ１６１を構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出することができる。 << First Embodiment >>
The correlation extraction program 162 according to the first embodiment will be described below with reference to FIGS. According to this correlation extraction program 162, even when the analysis data 161 includes different types of data, or even when the analyst does not understand the contents of each variable constituting the analysis data 161, a suitable correlation is obtained. Conditions can be automatically extracted.

図２は、相関性抽出処理を示すフローチャートである。このフローチャートを、以下の図３から図６までの各グラフとともに説明する。
ＣＰＵ１１が相関性抽出プログラム１６２を読み込んで実行することにより、以下の各ステップが実行される。
ＣＰＵ１１は、表示部１５に、目的変数のメニューと説明変数のメニューを含む初期設定画面を表示する。この初期設定画面は、後記する図１０で説明する。ユーザは、表示部１５にメニュー表示された目的変数と説明変数を、入力部１４により選択する。これにより、ＣＰＵ１１は、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数とする指定を受け付けて（Ｓ１０）、ステップＳ１１〜Ｓ１９の一連の動作を開始する。 FIG. 2 is a flowchart showing the correlation extraction process. This flowchart will be described together with the following graphs shown in FIGS.
When the CPU 11 reads and executes the correlation extraction program 162, the following steps are executed.
The CPU 11 displays an initial setting screen including a menu of objective variables and a menu of explanatory variables on the display unit 15. This initial setting screen will be described later with reference to FIG. The user selects the objective variable and the explanatory variable displayed on the menu on the display unit 15 by using the input unit 14. Thereby, CPU11 receives the designation | designated which makes two variables among the some variables which comprise analysis data a target variable and an explanatory variable (S10), and starts a series of operation | movement of step S11-S19.

ステップＳ１１において、ＣＰＵ１１は、入力された２変数で構成される散布図における分析データ１６１の重心２１（図３参照）を算出する。この重心２１について、図３のグラフを用いて説明する。 In step S11, the CPU 11 calculates the center of gravity 21 (see FIG. 3) of the analysis data 161 in the scatter diagram composed of the two input variables. The center of gravity 21 will be described using the graph of FIG.

図３は、分析データ１６１の機械使用時間と部品交換回数の散布図における重心２１を特定する動作を説明する図である。この散布図の横軸は、機械使用時間である。散布図の縦軸は、部品交換回数である。 FIG. 3 is a diagram for explaining the operation of specifying the center of gravity 21 in the scatter diagram of the machine usage time and the number of parts replacement of the analysis data 161. The horizontal axis of this scatter diagram is the machine usage time. The vertical axis of the scatter diagram is the number of parts replacement.

具体的にいうと、ＣＰＵ１１は、分析データ１６１の機械使用時間の平均を算出する。これにより重心２１の横軸座標が算出される。次にＣＰＵ１１は、分析データ１６１の部品交換回数の平均を算出する。これにより重心２１の縦軸座標が算出される。 More specifically, the CPU 11 calculates the average of the machine usage time of the analysis data 161. Thereby, the horizontal axis coordinate of the gravity center 21 is calculated. Next, the CPU 11 calculates an average of the number of parts replacement in the analysis data 161. Thereby, the vertical coordinate of the center of gravity 21 is calculated.

ステップＳ１２において、ＣＰＵ１１は、重心２１を通る線を引き、これを直線３とする。これにより、ＣＰＵ１１は、機械使用時間と部品交換回数の散布図における重心を通る各直線を算出する。次にＣＰＵ１１は、ステップＳ１３〜Ｓ１６において、直線３の回転処理を行う。この線について、図４のグラフを用いて説明する。 In step S <b> 12, the CPU 11 draws a line passing through the center of gravity 21 and sets this as a straight line 3. Thereby, the CPU 11 calculates each straight line passing through the center of gravity in the scatter diagram of the machine usage time and the number of parts replacement. Next, CPU11 performs the rotation process of the straight line 3 in step S13-S16. This line will be described with reference to the graph of FIG.

図４は、選択した２変数の散布図の重心２１を通る直線３を引く動作を説明する図である。具体的にいうと、ＣＰＵ１１は、重心２１を通る直線３を引く。更にＣＰＵ１１は、この直線３を０度から１度ずつ回転させ、１８０度になるまで繰り返す。但し、回転角は１度ごとに限定されず、所定の角度ごとに回転させてもよい。 FIG. 4 is a diagram for explaining an operation of drawing a straight line 3 passing through the center of gravity 21 of the selected two-variable scatter diagram. Specifically, the CPU 11 draws a straight line 3 passing through the center of gravity 21. Further, the CPU 11 rotates the straight line 3 from 0 degree by 1 degree and repeats until it reaches 180 degrees. However, the rotation angle is not limited to every 1 degree, and may be rotated every predetermined angle.

この回転処理ごとに、ＣＰＵ１１は、全ての分析データ１６１のうち直線３に結びつくデータ２が所定割合（例えば２５％）になるように抽出する（Ｓ１３）。このデータ２の抽出処理について、図５のグラフを用いて説明する。 For each rotation process, the CPU 11 extracts the data 2 linked to the straight line 3 out of all the analysis data 161 so as to have a predetermined ratio (for example, 25%) (S13). The data 2 extraction process will be described with reference to the graph of FIG.

図５は、直線３との偏差が閾値を超えないデータ２を抽出する動作を説明する図である。ＣＰＵ１１は、各データ２と直線３との偏差を算出し、この偏差が閾値を超えないデータ２の数が、例えば分析データ１６１に含まれるデータ２の数の２５％になるよう閾値を設定し、データ２を抽出する。具体的にいうと、ＣＰＵ１１は、各データ２に、このデータ２と直線３との偏差とを対応付ける。更にＣＰＵ１１は、直線３との偏差の昇順で各データ２を並べ替え、偏差が小さいものから順に２５％分のデータ２を抽出すればよい。 FIG. 5 is a diagram for explaining the operation of extracting data 2 whose deviation from the straight line 3 does not exceed the threshold value. The CPU 11 calculates the deviation between each data 2 and the straight line 3, and sets the threshold value so that the number of data 2 whose deviation does not exceed the threshold value is, for example, 25% of the number of data 2 included in the analysis data 161. Data 2 is extracted. Specifically, the CPU 11 associates each data 2 with a deviation between the data 2 and the straight line 3. Further, the CPU 11 rearranges each data 2 in ascending order of deviation from the straight line 3, and extracts 25% of data 2 in order from the smallest deviation.

ＣＰＵ１１は、ステップＳ１３で抽出したデータ２から、出現比率の大きい単一変数または／および変数の組合せ、およびその範囲と、その条件付き確率とを算出する（Ｓ１４）。このステップＳ１４の処理は、後記する図７Ａと図７Ｂで詳細に説明する。これにより、図６に示すように、抽出したデータ２が、所定条件を満たすものに更に絞り込まれる。 The CPU 11 calculates, from the data 2 extracted in step S13, a single variable or / and a combination of variables having a large appearance ratio, its range, and its conditional probability (S14). The process of step S14 will be described in detail with reference to FIGS. 7A and 7B described later. Thereby, as shown in FIG. 6, the extracted data 2 is further narrowed down to those satisfying the predetermined condition.

具体的にいうと、ステップＳ１４において、ＣＰＵ１１は、直線３との偏差が閾値を超えないデータ２を抽出し、そのときのデータ２に共通する条件や特徴などを抽出する。ここで共通する条件や特徴とは、例えば生産地域が同一であることや、生産地域および使用地域が同一であること等である。抽出されるデータ数が多いほど、信頼性の高い相関が導出される。よって、信頼性の高い相関が導出される条件を用いることで、必要な条件を自動で抽出可能である。 Specifically, in step S14, the CPU 11 extracts data 2 whose deviation from the straight line 3 does not exceed the threshold, and extracts conditions and features common to the data 2 at that time. Here, common conditions and features include, for example, the same production region, the same production region and use region, and the like. The more data that is extracted, the more reliable the correlation is derived. Therefore, it is possible to automatically extract necessary conditions by using a condition from which a highly reliable correlation is derived.

ＣＰＵ１１は、直線３を更に１度回転させ（Ｓ１５）、１８０度まで回転し終えたか否かを判定する（Ｓ１６）。ＣＰＵ１１は、直線３を１８０度まで回転し終えていないならば（Ｎｏ）、ステップＳ１３に戻り、直線３を１８０度まで回転し終えたならば（Ｙｅｓ）、ステップＳ１７に進む。即ちＣＰＵ１１は、ステップＳ１３〜Ｓ１６において、重心を通る直線を１度ごとに回転させて各直線３としている。 The CPU 11 further rotates the straight line 3 once (S15), and determines whether or not the straight line 3 has been rotated up to 180 degrees (S16). If the CPU 11 has not finished rotating the straight line 3 to 180 degrees (No), the CPU 11 returns to step S13. If the CPU 11 has finished rotating the straight line 3 to 180 degrees (Yes), the CPU 11 proceeds to step S17. That is, in steps S13 to S16, the CPU 11 rotates the straight line passing through the center of gravity every one degree to obtain each straight line 3.

ステップＳ１７において、ＣＰＵ１１は、データ２から相関係数を算出し、この相関係数と条件付確率により、単一変数または／および変数の組合せの評価数値を算出する。
ＣＰＵ１１は、単一変数または／および変数の組合せを評価数値により降順に並べ替え（Ｓ１８）、並べ替えた単一変数または／および変数の組合せを含む分析結果（図１１参照）を表示部１５に表示すると（Ｓ１９）、図２の処理を終了する。 In step S <b> 17, the CPU 11 calculates a correlation coefficient from the data 2, and calculates an evaluation value of a single variable or / and a combination of variables based on the correlation coefficient and the conditional probability.
The CPU 11 sorts the single variables or / and combinations of variables in descending order according to the evaluation numerical values (S18), and the analysis result (see FIG. 11) including the sorted single variables or / and combinations of variables is displayed on the display unit 15. When displayed (S19), the processing in FIG. 2 is terminated.

図７Ａと図７Ｂは、条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャートである。このフローチャートに示した処理は、図２のステップＳ１４の処理に対応する。 FIG. 7A and FIG. 7B are flowcharts showing an operation of extracting a single variable or / and a combination of variables that satisfy a condition and a conditional probability thereof. The process shown in this flowchart corresponds to the process of step S14 in FIG.

ＣＰＵ１１は、直線３の周りのデータを抽出する（Ｓ３０）。次にＣＰＵ１１は、抽出したデータについて変数毎の出現比率を算出する（Ｓ３１）。ここで変数の出現比率とは、この変数の最頻値の比率、または、この変数のヒストグラムのうち個数が多いデータの比率のことをいう。 The CPU 11 extracts data around the straight line 3 (S30). Next, CPU11 calculates the appearance ratio for every variable about the extracted data (S31). Here, the appearance ratio of the variable means the ratio of the mode value of this variable or the ratio of data having a large number in the histogram of this variable.

ＣＰＵ１１は、各変数について、ステップＳ３２〜Ｓ４６の処理を繰り返す。
最初、ＣＰＵ１１は、各変数のうち一つを１個目として選択する（Ｓ３２）。ＣＰＵ１１は、この変数の出現比率が５０％を超えるか否かを判定する（Ｓ３３）。ＣＰＵ１１は、この変数の出現比率が５０％を超えないならば（Ｎｏ）、ステップＳ３４の処理に進み、この変数の出現比率が５０％を超えるならば（Ｙｅｓ）、ステップＳ３６の処理に進む。但し、変数の出現比率の閾値については、あらかじめ定めた任意の所定の値でもよい。 CPU11 repeats the process of step S32-S46 about each variable.
First, the CPU 11 selects one of the variables as the first (S32). The CPU 11 determines whether the appearance ratio of this variable exceeds 50% (S33). If the appearance ratio of this variable does not exceed 50% (No), the CPU 11 proceeds to the process of step S34. If the appearance ratio of this variable exceeds 50% (Yes), the CPU 11 proceeds to the process of step S36. However, the threshold value of the variable appearance ratio may be an arbitrary predetermined value.

ステップＳ３４において、ＣＰＵ１１は、指定回数よりも範囲を拡げた回数が大きいか否かを判定する。ＣＰＵ１１は、指定回数よりも範囲を拡げた回数が大きくないならば（Ｎｏ）、この変数の範囲を拡げて（Ｓ３５）、ステップＳ３３の処理に戻る。ＣＰＵ１１は、指定回数よりも範囲を拡げた回数が大きいならば（Ｙｅｓ）、ステップＳ３６の処理に進む。 In step S34, the CPU 11 determines whether or not the number of times that the range has been expanded is larger than the specified number. If the number of times of expanding the range is not larger than the specified number of times (No), the CPU 11 expands the range of this variable (S35) and returns to the processing of step S33. If the number of times that the range has been expanded is larger than the specified number of times (Yes), the CPU 11 proceeds to the process of step S36.

ステップＳ３４の変数の範囲を拡げる処理を、図８Ａと図８Ｂを用いて説明する。この図８Ａは、変数Ａのヒストグラムを示している。変数Ａの値の範囲は、式（１）に示すスタージェスの公式を使用することで、好適に範囲を設定することができる。

The process of expanding the variable range in step S34 will be described with reference to FIGS. 8A and 8B. FIG. 8A shows a histogram of the variable A. The range of the value of the variable A can be suitably set by using the Sturges formula shown in Expression (1).

データ６３は、変数Ａの最頻値であり、変数Ａが１０〜２０の範囲のデータである。ここではデータ６３の出現比率は５０％以下なので、変数の範囲が拡げられる。 The data 63 is the mode value of the variable A, and the variable A is data in the range of 10-20. Here, since the appearance ratio of the data 63 is 50% or less, the range of the variable is expanded.

図８Ｂは、変数Ａのデータ６３に加えて、次に多いデータ６４も範囲に加えられたことを示している。なお、この範囲を拡げる処理は、量的データも質的データも同様である。このように単一変数であっても、範囲を拡げることで出現比率を閾値以上にすることができる。 FIG. 8B shows that in addition to the data 63 of the variable A, the next most data 64 is also added to the range. The processing for expanding this range is the same for quantitative data and qualitative data. In this way, even if it is a single variable, the appearance ratio can be increased to a threshold value or more by expanding the range.

図７Ａに戻り説明を続ける。ステップＳ３６において、ＣＰＵ１１は、該変数の出現比率を条件付き確率として記録する。また、ＣＰＵ１１は、該変数の相関係数を記録し（Ｓ３７）、図７ＢのステップＳ３８の処理に進む。
ＣＰＵ１１は、該変数を除く、その他の変数について、ステップＳ３８〜Ｓ４４の処理を繰り返す。最初、ＣＰＵ１１は、該変数を除く、その他の変数のうち一つを２個目として選択し（Ｓ３８）、１個目に選択した該変数と２個目に選択したその他の変数の組合せの出現比率を算出する（Ｓ３９）。 Returning to FIG. 7A, the description will be continued. In step S36, the CPU 11 records the appearance ratio of the variable as a conditional probability. Further, the CPU 11 records the correlation coefficient of the variable (S37), and proceeds to the process of step S38 in FIG. 7B.
CPU11 repeats the process of step S38-S44 about the other variable except this variable. First, the CPU 11 selects one of the other variables excluding the variable as the second (S38), and the appearance of the combination of the first selected variable and the other selected second variable. The ratio is calculated (S39).

ＣＰＵ１１は、変数の組合せの出現比率が４０％を超えるか否かを判定する（Ｓ４０）。ＣＰＵ１１は、変数の組合せの出現比率が４０％を超えないならば（Ｎｏ）、ステップＳ４３の処理に進み、変数の組合せの出現比率が４０％を超えるならば（Ｙｅｓ）、ステップＳ４１の処理に進む。但し、変数の組合せの出現比率の閾値については、あらかじめ定めた任意の所定の値でもよい。 The CPU 11 determines whether or not the appearance ratio of the variable combination exceeds 40% (S40). If the appearance ratio of the variable combination does not exceed 40% (No), the CPU 11 proceeds to the process of step S43. If the appearance ratio of the variable combination exceeds 40% (Yes), the CPU 11 proceeds to the process of step S41. move on. However, the threshold of the appearance ratio of the variable combination may be any predetermined value set in advance.

ステップＳ４１において、ＣＰＵ１１は、該変数の出現比率を条件付き確率として記録する。また、ＣＰＵ１１は、該変数の相関係数を記録し（Ｓ４２）、ステップＳ４３の処理に進む。 In step S41, the CPU 11 records the appearance ratio of the variable as a conditional probability. Further, the CPU 11 records the correlation coefficient of the variable (S42), and proceeds to the process of step S43.

ステップＳ４３において、ＣＰＵ１１は、該変数を除く、次の変数を２個目として選択する。次にＣＰＵ１１は、該変数を除く、その他の全ての変数について処理を終了したか否か、即ち該変数を除く、次の変数の選択に失敗したか否かを判定する（Ｓ４４）。ＣＰＵ１１は、該変数を除く、その他の全ての変数について処理を終了していないならば（Ｎｏ）、ステップＳ３９に戻り、該変数を除く、その他の全ての変数について処理を終了したならば（Ｙｅｓ）、ステップＳ４５に進む。 In step S43, the CPU 11 selects the next variable excluding the variable as the second variable. Next, the CPU 11 determines whether or not the processing has been completed for all other variables excluding the variable, that is, whether or not selection of the next variable excluding the variable has failed (S44). If the CPU 11 has not finished processing for all other variables except the variable (No), the CPU 11 returns to step S39, and if the processing has been finished for all other variables except the variable (Yes). ), And proceeds to step S45.

ステップＳ４５において、ＣＰＵ１１は、１個目として次の変数を選択する。次にＣＰＵ１１は、全ての変数について処理を終了したか否か、即ち１個目として次の変数の選択に失敗したか否かを判定する（Ｓ４６）。ＣＰＵ１１は、全ての変数について処理を終了していないならば（Ｎｏ）、図７ＡのステップＳ３３に戻り、全ての変数について処理を終了したならば（Ｙｅｓ）、図７Ｂの処理を終了する。但し、変数の最大選択個数は２個に限定されず、任意の所定の値でもよい。 In step S45, the CPU 11 selects the next variable as the first. Next, the CPU 11 determines whether or not the processing has been completed for all the variables, that is, whether or not selection of the next variable has failed as the first one (S46). If the process has not been completed for all variables (No), the CPU 11 returns to step S33 in FIG. 7A, and if the process has been completed for all variables (Yes), the process in FIG. 7B is terminated. However, the maximum number of selected variables is not limited to two, and may be any predetermined value.

ステップＳ３９〜Ｓ４０の変数の組合せの抽出処理を、図９Ａと図９Ｂを用いて説明する。この図９Ａは、変数Ａのヒストグラムを示している。データ６１は、変数Ａの最頻値であり、変数Ａが１０〜２０の範囲のデータである。 The variable combination extraction process in steps S39 to S40 will be described with reference to FIGS. 9A and 9B. FIG. 9A shows a histogram of the variable A. The data 61 is the mode value of the variable A, and the variable A is data in the range of 10-20.

図９Ｂは、変数Ｚのヒストグラムを示している。データ６２は、変数Ｚの最頻値であり、変数Ｚが２０〜３０の範囲のデータである。 FIG. 9B shows a histogram of the variable Z. The data 62 is the mode value of the variable Z, and the variable Z is data in the range of 20-30.

ＣＰＵ１１は、変数Ａの最頻値であるデータ６１を算出して、データ６１の出現比率が５０％を超えるか否かを判定する。ここでは５０％を超えているので、ステップＳ３６に進み、変数の組合せの抽出処理を行う。 The CPU 11 calculates the data 61 that is the mode value of the variable A, and determines whether or not the appearance ratio of the data 61 exceeds 50%. In this case, since it exceeds 50%, the process proceeds to step S36, and variable combination extraction processing is performed.

次にＣＰＵ１１は、データ６１に係る他の変数Ｂ〜Ｚの最頻値を算出し、変数の組合せの出現比率を算出する。具体的にいうと、変数Ａが１０〜２０の範囲かつ変数Ｂが２０〜３０の範囲の出現比率は４５％である。変数Ａが１０〜２０の範囲かつ変数Ｃが３０〜４０の範囲の出現比率は３０％である。以下同様に、変数Ａが１０〜２０の範囲かつ変数Ｚが２０〜３０の範囲の出現比率は８０％である。このようにＣＰＵ１１は、変数と他の変数との組合せの出現比率を算出する。以下、変数Ｂと変数Ａ，Ｃ〜Ｚとの組み合わせも同様である。ＣＰＵ１１は、これら２つの変数の組合せを評価数値により降順に並べ替えて、表示部に表示する。これによりＣＰＵは、異なる種類の変数であっても、それらの組合せのうち最も出現比率の高いものを機械的に抽出して表示することができる。 Next, the CPU 11 calculates the mode value of the other variables B to Z related to the data 61 and calculates the appearance ratio of the combination of variables. Specifically, the appearance ratio of the variable A in the range of 10 to 20 and the variable B in the range of 20 to 30 is 45%. The appearance ratio of the variable A in the range of 10 to 20 and the variable C in the range of 30 to 40 is 30%. Similarly, the appearance ratio of the variable A in the range of 10 to 20 and the variable Z in the range of 20 to 30 is 80%. Thus, the CPU 11 calculates the appearance ratio of the combination of the variable and other variables. Hereinafter, the combination of the variable B and the variables A and C to Z is the same. The CPU 11 rearranges the combinations of these two variables according to the evaluation numerical values in descending order and displays them on the display unit. Thereby, even if it is a variable kind of variable, CPU can extract and display the thing with the highest appearance ratio among those combinations.

図１０は、相関性抽出のための初期設定画面４である。
初期設定画面４は、データ選択コンボボックス４１、目的変数コンボボックス４２、説明変数コンボボックス４３、ＯＫボタン４４、キャンセルボタン４５を含んで構成される。 FIG. 10 shows an initial setting screen 4 for correlation extraction.
The initial setting screen 4 includes a data selection combo box 41, a target variable combo box 42, an explanatory variable combo box 43, an OK button 44, and a cancel button 45.

データ選択コンボボックス４１は、相関性を抽出する対象となる分析データ１６１を選択するコンボボックス（メニュー）であり、ここでは「Ａ装置の稼働ログ」が選択されている。
目的変数コンボボックス４２は、分析データ１６１に含まれる各変数から、目的変数を選択するコンボボックスであり、ここでは「部品交換回数」が選択されている。 The data selection combo box 41 is a combo box (menu) for selecting the analysis data 161 from which the correlation is to be extracted. Here, “operation log of apparatus A” is selected.
The objective variable combo box 42 is a combo box for selecting an objective variable from each variable included in the analysis data 161. Here, “number of parts replacement” is selected.

説明変数コンボボックス４３は、分析データ１６１に含まれる各変数から、説明変数を選択するコンボボックスであり、ここでは「機械使用時間」が選択されている。
ＯＫボタン４４は、データ選択コンボボックス４１によって選択された分析データ１６１の相関性抽出を実行するためのボタンである。 The explanatory variable combo box 43 is a combo box for selecting an explanatory variable from each variable included in the analysis data 161, and “machine usage time” is selected here.
The OK button 44 is a button for executing correlation extraction of the analysis data 161 selected by the data selection combo box 41.

キャンセルボタン４５は、各コンボボックスで選択された内容をキャンセルして、この初期設定画面４を閉じるためのボタンである。
ユーザがこの初期設定画面４を操作することで、分析データ、目的変数および説明変数を設定することができる。 The cancel button 45 is a button for canceling the contents selected in each combo box and closing the initial setting screen 4.
When the user operates the initial setting screen 4, analysis data, objective variables, and explanatory variables can be set.

図１１は、相関性を抽出した分析結果５を示す図である。この分析結果５は、図２のステップＳ１９の処理にて表示される。
この分析結果５は、番号欄と、対象変数欄と、直線式欄と、評価数値欄と、変数名#1欄および範囲欄#1、変数名#2欄および範囲欄#2を含んでいる。なお、更に右側の変数名#n欄および範囲欄は記載を省略している。 FIG. 11 is a diagram showing the analysis result 5 obtained by extracting the correlation. This analysis result 5 is displayed in the process of step S19 in FIG.
This analysis result 5 includes a number column, a target variable column, a linear expression column, an evaluation numerical value column, a variable name # 1 column and a range column # 1, a variable name # 2 column, and a range column # 2. . Further, the variable name #n column and the range column on the right side are omitted.

番号欄は、相関性のランキング番号を示している。
対象変数欄は、目的変数名と説明変数名とを示しており、ここでは「部品交換回数×機械使用時間」が示されている。 The number column indicates the correlation ranking number.
The target variable column shows the objective variable name and the explanatory variable name, and here, “number of parts replacement × machine usage time” is shown.

直線式欄は、直線の定数と傾き（一次定数）が示されている。ここでは「ｙ＝ａｘ＋ｂ」と記載されているが、実際にはａとｂに具体的な数値が示されている。
評価数値欄は、単一変数または／および変数の組合せの評価数値が示されている。ここで評価数値とは、相関係数と条件付き確率の積である。 The linear equation column shows the constant and slope (primary constant) of the straight line . Here, “y = ax + b” is described, but actually, specific values are shown in a and b.
In the evaluation numerical value column, evaluation numerical values of single variables or / and combinations of variables are shown. Here, the evaluation numerical value is a product of a correlation coefficient and a conditional probability.

各変数名欄には、このランキングに係る単一変数または／および変数の組合せが示されている。この変数名の右側の各範囲欄には、この変数に係る最頻値を与える範囲が示されている。このように、データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件となる単一変数または／および変数の組合せを、評価数値の降順のランキング形式で自動抽出できる。
このことにより、様々なデータが混在している場合でもデータの特徴が埋もれることなく、好適な条件を抽出できる。更に、データの中に隠された知見、例えば説明変数がある範囲の時に目的変数に対する影響が大きい等の条件を抽出することができる。本実施形態の場合、図１１に示すように部品交換回数と機械使用時間に関しては、部品番号Ａ０１と湿度が２０〜３０の範囲の条件である時に相関関係が一番高いことがわかる。 Each variable name column shows a single variable or / and a combination of variables related to this ranking. Each range column on the right side of the variable name indicates a range in which the mode value relating to the variable is given. In this way, even if the analyst does not understand the contents of each variable that makes up the data, single variables and / or combinations of variables that are suitable correlation conditions are automatically extracted in the descending ranking format of the evaluation values. it can.
Thereby, even when various data are mixed, suitable conditions can be extracted without burying the characteristics of the data. Furthermore, it is possible to extract knowledge hidden in the data, for example, a condition that the influence on the objective variable is large when the explanatory variable is within a certain range. In the case of the present embodiment, as shown in FIG. 11, regarding the number of parts replacement and the machine usage time, it can be seen that the correlation is highest when the part number A01 and the humidity are in the range of 20-30.

《第２の実施形態》
第２の実施形態は、分析データをクラスタ化したのち、各クラスタにおいて重心を求めて、好適な相関条件となる単一変数または／および変数の組合せを抽出するというものである。 << Second Embodiment >>
In the second embodiment, after analyzing data is clustered, a centroid is obtained in each cluster, and a single variable or / and a combination of variables serving as suitable correlation conditions are extracted.

図１２は、クラスタ化した分析データの相関性抽出処理を示すフローチャートである。
ＣＰＵ１１は、表示部１５に、目的変数のメニューと説明変数のメニューを選択可能に表示する。ユーザは、表示部１５にメニュー表示された目的変数と説明変数を、入力部１４により選択する（Ｓ５０）。これにより、ＣＰＵ１１は、ステップＳ５１〜Ｓ５９の一連の動作を開始する。
目的変数と説明変数を選択することにより、図１３の２変数の散布図が決定する。分析データ１６１に含まれる各データ２は、この散布図にプロットされている。なお、図１３から図１５までのグラフの横軸は、機械使用時間である。グラフの縦軸は、部品交換回数である。 FIG. 12 is a flowchart showing a correlation extraction process for clustered analysis data.
The CPU 11 displays a menu of objective variables and a menu of explanatory variables on the display unit 15 in a selectable manner. The user selects the objective variable and the explanatory variable displayed on the menu on the display unit 15 by using the input unit 14 (S50). Thereby, CPU11 starts a series of operation | movement of step S51-S59.
By selecting the objective variable and the explanatory variable, the two-variable scatter diagram of FIG. 13 is determined. Each data 2 included in the analysis data 161 is plotted in this scatter diagram. In addition, the horizontal axis of the graphs of FIGS. 13 to 15 is the machine usage time. The vertical axis of the graph represents the number of parts replacement.

ＣＰＵ１１は、クラスタ数のｋに２の初期値を設定すると（Ｓ５１）、ステップＳ５２に進み、ｋ−ｍｅａｎｓによりクラスタリングを実施する。 When the CPU 11 sets an initial value of 2 to the number k of clusters (S51), the CPU 11 proceeds to step S52 and performs clustering by k-means.

ＣＰＵ１１は、データ個数が３０未満のクラスタ２２が有るか否かを判定する（Ｓ５３）。ここでデータ個数の閾値の３０は一例であり、サンプルに必要な数であればよい。サンプルに必要な数は、分析データ１６１や変数によって可変であってもよい。 The CPU 11 determines whether or not there is a cluster 22 having a data count of less than 30 (S53). Here, the threshold value 30 for the number of data is merely an example, and may be any number necessary for the sample. The number required for the sample may be variable depending on the analysis data 161 and variables.

ＣＰＵ１１は、クラスタ２２のデータ個数が何れも３０個以上ならば（Ｎｏ）、クラスタ数のｋを一つ増加させて（Ｓ５４）、ステップＳ５２に戻る。ＣＰＵ１１は、クラスタ２２のデータ個数が３０未満のものが有れば（Ｙｅｓ）、ステップＳ５５の処理に進み、その１つ前の（ｋ−１）個のクラスタ２２を処理の対象とする。 If the number of data in the cluster 22 is 30 or more (No), the CPU 11 increments the cluster number k by 1 (S54) and returns to step S52. If the number of data in the cluster 22 is less than 30 (Yes), the CPU 11 proceeds to the process of step S55 and sets the previous (k−1) clusters 22 as the processing target.

クラスタリングを実施した結果の一例を図１４に示す。図１４は、３つのクラスタ２２ａ，２２ｂ，２２ｃに分けられている。各クラスタ２２ａ，２２ｂ，２２ｃは、重心２１ａ，２１ｂ，２１ｃを含んでいる。以下、各クラスタを区別しないときには、単にクラスタ２２と記載する。
ＣＰＵ１１は、各クラスタ２２ａ，２２ｂ，２２ｃの重心２１ａ，２１ｂ，２１ｃから、それぞれ直線３ａ，３ｂ，３ｃを引く。ＣＰＵ１１は、これら直線３ａ，３ｂ，３ｃを同時に回転させながら単一変数または／および変数の組合せの相関係数と条件付確率を求める（Ｓ５６）。これら直線３ａ，３ｂ，３ｃは、後記する図１５に示されている。このステップＳ５６の処理は、図２のステップＳ１３〜Ｓ１６の処理に対応する。 An example of the result of clustering is shown in FIG. FIG. 14 is divided into three clusters 22a, 22b, and 22c. Each cluster 22a, 22b, 22c includes a center of gravity 21a, 21b, 21c. Hereinafter, when each cluster is not distinguished, it is simply referred to as cluster 22.
The CPU 11 draws straight lines 3a, 3b, and 3c from the centers of gravity 21a, 21b, and 21c of the clusters 22a, 22b, and 22c, respectively. The CPU 11 obtains the correlation coefficient and conditional probability of a single variable or / and a combination of variables while simultaneously rotating these straight lines 3a, 3b, 3c (S56). These straight lines 3a, 3b, 3c are shown in FIG. The processing in step S56 corresponds to the processing in steps S13 to S16 in FIG.

次にＣＰＵ１１は、相関係数と条件付確率により、単一変数または／および変数の組合せの評価数値を算出する（Ｓ５７）。ＣＰＵ１１は、単一変数または／および変数の組合せを評価数値により降順に並べ替え（Ｓ５８）、並べ替えた単一変数または／および変数の組合せを表示部１５に表示し（Ｓ５９）、図１２の処理を終了する。これらステップＳ５７〜Ｓ５９の処理は、図２のステップＳ１７〜Ｓ１９の処理に対応する。 Next, the CPU 11 calculates an evaluation value of a single variable or / and a combination of variables based on the correlation coefficient and the conditional probability (S57). The CPU 11 rearranges the single variables or / and variable combinations in descending order according to the evaluation numerical values (S58), and displays the rearranged single variables or / and variable combinations on the display unit 15 (S59). The process ends. These processes in steps S57 to S59 correspond to the processes in steps S17 to S19 in FIG.

（変形例）
本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば上記した実施形態は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。ある実施形態の構成の一部を他の実施形態の構成に置き換えることが可能であり、ある実施形態の構成に他の実施形態の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることも可能である。 (Modification)
The present invention is not limited to the embodiments described above, and includes various modifications. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. A part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Moreover, it is also possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

上記の各構成、機能、処理部、処理手段などは、それらの一部または全部を、例えば集積回路などのハードウェアで実現してもよい。上記の各構成、機能などは、プロセッサがそれぞれの機能を実現するプログラムを解釈して実行することにより、ソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイルなどの情報は、メモリ、ハードディスク、ＳＳＤ（SolidStateDrive）などの記録装置、または、フラッシュメモリカード、ＤＶＤ（DigitalVersatileDisk）などの記録媒体に置くことができる。 Some or all of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware such as an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by a processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a recording device such as a memory, a hard disk, or an SSD (SolidStateDrive), or a recording medium such as a flash memory card or a DVD (Digital VersatileDisk).

各実施形態に於いて、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には、殆ど全ての構成が相互に接続されていると考えてもよい。
本発明の変形例として、例えば、次の（ａ）〜（ｈ）のようなものがある。 In each embodiment, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
Examples of modifications of the present invention include the following (a) to (h).

（ａ）第１の実施形態では、直線を引く点を重心の１点としているが、第２の実施形態のように複数の点から複数の直線を引いてもよく、限定されない。
（ｂ）クラスタリングの方法は、ｋ−ｍｅａｎｓに限定されず、任意の方法であってもよい。
（ｃ）抽出する変数の組合せは任意個数でよいが、経験的に３個までが好適である。
（ｄ）直線の近傍のデータを抽出したのちの条件抽出の方法は、図７Ａと図７Ｂの処理に限定されず、相関ルール抽出手法を使用して、目的変数および説明変数に相関があるものを抽出してもよい。
（ｅ）直線を回転させるステップは、１度ごとの回転角に限定されず、所定の角度ごとに回転させてもよい。
（ｆ）直線との偏差が閾値を超えないデータを２５％だけ抽出しているが、２５％に限定されず、任意の割合だけ抽出すればよい。
（ｇ）相関係数と条件付き確率の積の降順で、単一変数または／および変数の組合せを並び替えてランキング表示しているが、これに限られず、単一変数または／および変数の組合せを相関係数で並び替えてランキング表示してもよい。
（ｈ）コンピュータは、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数とする指定を受け付ける。しかし、これに限られず、コンピュータが、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数として選択してもよい。 (A) In the first embodiment, the point at which a straight line is drawn is one point of the center of gravity, but a plurality of straight lines may be drawn from a plurality of points as in the second embodiment, and there is no limitation.
(B) The clustering method is not limited to k-means, and may be any method.
(C) The number of combinations of variables to be extracted may be any number, but it is empirically preferred to be three.
(D) The method of condition extraction after extracting data in the vicinity of a straight line is not limited to the processing of FIGS. 7A and 7B, and there is a correlation between the objective variable and the explanatory variable using the correlation rule extraction method. May be extracted.
(E) The step of rotating the straight line is not limited to a rotation angle of 1 degree, and may be rotated every predetermined angle.
(F) Although 25% of the data whose deviation from the straight line does not exceed the threshold is extracted, the data is not limited to 25%, and an arbitrary ratio may be extracted.
(G) in descending order of the product of correlation coefficient and conditional probabilities, although ranking display rearranges combination of single variables and / or variables, not limited to this, the combination of a single variable or / and variables the may be ranking display is sorted by the correlation coefficient.
(H) The computer accepts a designation that sets two variables among the plurality of variables constituting the analysis data as an objective variable and an explanatory variable. However, the present invention is not limited to this, and the computer may select two variables among the plurality of variables constituting the analysis data as the objective variable and the explanatory variable.

１コンピュータ
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４入力部
１５表示部
１６記憶部
１６１分析データ
１６２相関性抽出プログラム
２データ
２１，２１ａ〜２１ｃ重心
２２ａ〜２２ｃクラスタ
３，３ａ〜３ｃ直線
４初期設定画面
４１データ選択コンボボックス
４２目的変数コンボボックス
４３説明変数コンボボックス
４４ＯＫボタン
４５キャンセルボタン
５分析結果 1 Computer 11 CPU
12 ROM
13 RAM
14 Input unit 15 Display unit 16 Storage unit 161 Analysis data 162 Correlation extraction program 2 Data 21, 21a to 21c Center of gravity 22a to 22c Cluster 3, 3a to 3c Line 4 Initial setting screen 41 Data selection combo box 42 Objective variable combo box 43 Explanation variable combo box 44 OK button 45 Cancel button 5 Analysis result

Claims

Computer
A step of accepting designation of two variables among a plurality of variables constituting analysis data;
Calculating each straight line passing through the center of gravity of the analysis data in the scatter diagram of the two variables;
Extracting each data whose deviation from each straight line does not exceed a threshold;
Calculating each correlation coefficient from each said data;
A step of extracting ratio of appearance from each of said data is taken out was combination of large single variable or / and variables than the predetermined value,
Displaying the single variable or / and the combination of variables on a display unit based on each correlation coefficient and each occurrence ratio ;
The correlation extraction method characterized by implementing.

The computer
In the step of calculating each said straight line to rotate the straight line passing through the center of gravity for each predetermined angle,
The correlation extracting method according to claim 1.

The computer is
To After performing the step of clustering the analytical data, performing the step of specifying the two variables of the plurality of variables that constitute the analytical data,
Each straight line passes through the center of gravity of each cluster of the analysis data in the scatter diagram of the two variables.
The correlation extraction method according to claim 1, wherein

The computer
In the step of displaying the single variable or / and the combination of variables on a display unit, the single variable or / and the combination of variables are displayed in descending order of the product of each of the correlation coefficients and each of the appearance ratios. ,
The correlation extraction method according to any one of claims 1 to 3.

The computer
In step of displaying a combination of said single variable or / and the variables, the combination of said single variable or / and the variables are displayed in descending order of each of the correlation coefficient,
The correlation extraction method according to any one of claims 1 to 3.

The computer
Per To remove the combination of the previous SL single variable or / and the variables of said single variable or / and a combination of data of the variables of the range set so as to be divided into a predetermined number of classes the mode If the appearance ratio of the data in the range is greater than or equal to a predetermined value, further combining other variables,
Correlation extraction method according to any one of the preceding claims out of 5, characterized in that to implement.

The computer
Per To remove the front SL single variable, the appearance ratio of the data in the range of the mode among the range of data is set to be divided into a predetermined number classes of the single variable if less than the predetermined value, Expanding the range;
Correlation extraction method according to any one of the claims 1 6, characterized in that to implement.

  Computer
  Selecting two variables out of a plurality of variables constituting the analysis data;
  Calculating each straight line passing through the center of gravity of the analysis data in the scatter diagram of the two variables;
  Extracting each data whose deviation from each straight line does not exceed a threshold;
  Calculating each correlation coefficient from each said data;
  Extracting a single variable or / and combination of variables having an appearance ratio greater than a predetermined value from each of the extracted data;
  Displaying the single variable or / and the combination of variables on a display unit based on each correlation coefficient and each occurrence ratio;
  The correlation extraction method characterized by implementing.

On the computer,
A step of accepting designation of two variables among a plurality of variables constituting analysis data;
Calculating each straight line passing through the center of gravity of the analysis data in the scatter diagram of the two variables;
Extracting each data whose deviation from each straight line does not exceed a threshold;
Calculating a correlation coefficient from each of the data,
Extracting a single variable or a combination of variables having an appearance ratio larger than a predetermined value from each of the extracted data ;
Displaying the single variable or / and the combination of variables on a display unit based on each correlation coefficient and each occurrence ratio ;
Correlation extraction program to execute.

  On the computer,
  Selecting two variables from a plurality of variables constituting the analysis data;
  Calculating each straight line passing through the center of gravity of the analysis data in the scatter diagram of the two variables;
  Extracting each data whose deviation from each straight line does not exceed a threshold;
  Calculating a correlation coefficient from each of the data,
  Extracting a single variable or a combination of variables having an appearance ratio larger than a predetermined value from each of the extracted data;
  Displaying the single variable or / and the combination of variables on a display unit based on each correlation coefficient and each occurrence ratio;
  Correlation extraction program to execute.