WO2011016281A2 - Information processing device and program for learning bayesian network structure - Google Patents
Information processing device and program for learning bayesian network structure Download PDFInfo
- Publication number
- WO2011016281A2 WO2011016281A2 PCT/JP2010/058963 JP2010058963W WO2011016281A2 WO 2011016281 A2 WO2011016281 A2 WO 2011016281A2 JP 2010058963 W JP2010058963 W JP 2010058963W WO 2011016281 A2 WO2011016281 A2 WO 2011016281A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- random variable
- edge
- random
- mutual information
- pair
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to an information processing apparatus and program for Bayesian network structure learning, and more particularly, to perform Bayesian network structure learning at high speed in a stable calculation time in a situation where there are a large number of random variables and a large amount of data.
- the present invention relates to an information processing apparatus and a program that can be used.
- Bayesian network which is an expression of probabilistic causality, is known for its high inference accuracy.
- Bayesian network structure learning is known to be NP-complete with respect to the number of nodes, and learning a large number of random variables from a large amount of data is a very difficult problem.
- Bayesian network structure learning algorithm extracts causal relations by scoring-based method aiming at maximizing approximation of true joint probability distribution and joint probability distribution expressed by Bayesian network and independence judgment between random variables It is roughly classified into a constraint-based method (also called CI (Conditional Independence) -based) for the purpose of doing this.
- a constraint-based learning algorithm is a generic name for algorithms that perform conditional independent tests between random variables using given data and determine whether edges (also called arcs) are required between random variables based on the results. is there. When dealing with large-scale random variables, it is known that the inference accuracy is higher when a method for determining independence between variables is used.
- TPDA Three Phase Dependency Analysis
- MDF monotone DAG faithful
- Set P has a corresponding joint probability distribution P (U) As given.
- TPDA TPDA
- three phases called a tree structure preparation (Drafting) phase, an edge increase (Thickening) phase, and an edge reduction (Thinning) phase are executed, and the structure is learned by finally directing edges.
- the main input is tabular data expressed in CSV format or relational database relations, and what random variables and what states (realized values) are included in the data. Is described (hereinafter referred to as “data specification description information”).
- conditional mutual information is calculated with the node adjacent to the variable node as the first set of conditions.
- the conditional mutual information is calculated by the following equation (3).
- the capital letter C represents a random variable set
- the capital letter c represents a state value set corresponding to each variable. If the calculated conditional mutual information amount is equal to or greater than the predetermined threshold ⁇ , the random variable set C (bold) is reduced and the calculation of Expression (3) is repeated. When a condition set in which the conditional mutual information is less than ⁇ is found, a record is registered in the global cut set without adding an undirected edge between the random variable pairs. On the other hand, if a condition set whose conditional mutual information is less than ⁇ is not found before the condition set becomes an empty set, an undirected edge is added between the random variable pairs.
- the undirected edge is temporarily deleted.
- the conditional mutual information is calculated until a condition set that is less than the threshold ⁇ is found while gradually reducing the condition set. If a condition set that is less than the threshold ⁇ is found, the record is registered in the global cut set while the undirected edge is deleted. If the condition set that is less than the threshold ⁇ is not found, the temporarily deleted undirected edge is added and restored.
- any of the random variable pairs has three or more adjacent nodes except for the counterpart of the pair, a more precise edge necessity determination is performed.
- edge increase and edge reduction work to determine the direction of the undirected edge added by these phases is performed. Specifically, for all sets of three nodes X, Y, and Z where X and Y and Z and Y are directly connected by undirected edges but X and Y are not directly connected, The direction is determined for an edge whose direction can be determined based on whether or not these nodes are included in the global cutting set.
- TPDA the Bayesian network structure is learned as described above.
- the TPDA described above is the fastest-running currently known constraint-based learning algorithm, and is an algorithm suitable for processing large-scale variables and large amounts of data.
- the number of random variables is still larger, there is a problem that it becomes difficult to calculate a conditional independent test. That is, since the number of variables corresponding to c (bold) in the joint probability distribution P (x, y, c (bold)) portion on the right side of the conditional mutual information shown in Expression (3) increases, the amount of calculation increases. It becomes difficult to calculate.
- the pattern of the joint probability distribution increases, there is a problem that more missing values are generated that do not contribute to the calculation result.
- the inventors of the present application can perform Bayesian network structure learning that is faster and has less processing time variation than the conventional TPDA described above. discovered.
- the inventors of the present application when searching for a cut set, perform conditional mutual information calculation in the order in which the subset size of the condition set is in ascending order, and use the MDF assumption as early as possible. Comparing the mutual information with conditions when given two conditional variables and given each conditional variable in the conditional variable set, two levels of variables not included in the cut set It was found that the cut set can be searched by only three steps at maximum by deleting all from the search target with the eyes. As a result, Bayesian network structure learning can be performed at higher speed than conventional TPDA.
- the present invention provides an information processing apparatus and program for performing Bayesian network structure learning based on these new algorithms.
- the information processing apparatus of the present invention is an information processing apparatus that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes.
- the information processing apparatus is a unit that generates a tree-structured graph for input data, and adds an edge between the random variable pairs for each random variable pair whose mutual information amount is equal to or greater than a first threshold value.
- the information processing apparatus further adds an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold but the edge is not added by the means for generating the tree structure graph. Means to do.
- the means includes a random variable included in a set of nodes on a path between the two random variable nodes constituting the random variable pair to which the edge is not added and adjacent to one of the two random variable nodes. If a pair whose condition value is less than the first threshold is found using the set of as a condition set, an edge is not added between the two random variables, otherwise Add an edge.
- the means in the calculation of the conditional mutual information, includes a second threshold value in which a joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is equal to or less than a first threshold value. If it is less, the calculation of the related component is omitted.
- the means for generating the tree-structured graph calculates the peripheral probability of each random variable included in the input data, and the peripheral probability taking a state with any random variable constituting the random variable pair is less than the second threshold In this case, the mutual information amount of each random variable pair may be calculated by omitting the calculation of the component of the related mutual information amount.
- the means for adding an edge includes a set of nodes adjacent to one of the two random variable nodes on a path between two random variable nodes constituting a random variable pair to which no edge is added. Is the final condition set, and the conditional mutual information about the two random variable nodes and the condition set C is the size from when the size of the condition set C is 1 to the size of the final condition set.
- the information processing apparatus determines, for each random variable pair having an edge after processing by the means for adding an edge, whether or not an edge is necessary, and deletes the edge if unnecessary, Means for directing the edge.
- the information processing apparatus of the present invention uses a constraint-based learning algorithm to perform Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. It is the information processing apparatus which performs.
- the information processing apparatus determines whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount, and at the time of the determination, for the two random variable nodes constituting the random variable pair, A conditional mutual information amount is calculated using a set of random variables included in a set of nodes adjacent to one of the two random variable nodes on the path between the two as a condition set, and the value is less than the first threshold value.
- the information processing apparatus has a simultaneous probability distribution for the states of the two random variables and the state set corresponding to each random variable less than a second threshold that is equal to or less than a first threshold. In this case, calculation of related components is omitted.
- the information processing apparatus uses, as a final condition set, a set of nodes on the path between them and adjacent to either of the two random variable nodes.
- the conditional mutual information about the two random variable nodes and the condition set C is calculated while increasing the size from the case where the size of the condition set C is 1 to the size of the final condition set.
- the present invention is a program that causes a computer to operate like the information processing apparatus described above.
- the information processing apparatus of the present invention is an information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes.
- the information processing apparatus is a unit that generates a tree-structured graph for input data, and adds an edge between the random variable pairs for each random variable pair whose mutual information amount is equal to or greater than a first threshold value.
- the information processing apparatus further adds an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold but the edge is not added by the means for generating the tree structure graph. Means to do.
- the means for adding the edge is a set of nodes adjacent to one of the two random variable nodes on a path between two random variable nodes constituting the random variable pair to which the edge is not added.
- a conditional set including the included random variables is used as a candidate condition set, each one of the random variables in the candidate condition set is used as a conditional set, and the conditional mutual information is calculated. At least one of the calculated conditional mutual information Is less than the first threshold, the processing for the random variable pair is terminated without adding an edge between the two random variables.
- the means for adding the edge calculates the conditional mutual information using any two random variable pairs in the candidate condition set as a condition set when the above processing is not completed, and the calculated conditional mutual information If at least one of the quantities is less than the first threshold, the process ends without adding an edge between the two random variables.
- the calculated conditional mutual information is larger than the conditional mutual information that has already been calculated using only one random variable as a condition set
- the one random variable is deleted from the candidate condition set and calculated If the conditional mutual information amount is larger than the conditional mutual information amount that has already been calculated using only the other random variable as a condition set, the other random variable is deleted from the candidate condition set.
- the means for adding the edge calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set as a condition set when the above processing is not completed, and at least one of the calculated conditional mutual information amounts. If is less than the first threshold, the process ends without adding an edge between the two random variables.
- the information processing apparatus further includes means for adding an edge that adds an edge between the two random variables when the above processing is not completed.
- the information processing apparatus further determines, for each random variable pair having an edge after the above processing, whether or not an edge is necessary, and if it is not necessary, deletes the edge and directs each edge. Means for performing.
- the information processing apparatus of the present invention uses a constraint-based learning algorithm to perform Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. It is the information processing apparatus which performs. The information processing apparatus determines whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount.
- the information processing apparatus for the two random variable nodes constituting the random variable pair, selects a random variable included in a set of nodes on a path between the adjacent random variable nodes and adjacent to either of the two random variable nodes.
- Conditional mutual information is calculated by using a condition set including the candidate condition set and each one of the random variables in the candidate condition set as a condition set, and at least one of the calculated conditional mutual information is less than the first threshold value In such a case, the processing for the random variable pair is terminated without adding an edge between the two random variables.
- the information processing apparatus calculates a conditional mutual information amount using any two sets of random variables in the candidate condition set as a condition set, and at least of the calculated conditional mutual information amount If either is less than the first threshold, the process is terminated without adding an edge between the two random variables.
- the calculated conditional mutual information is larger than the conditional mutual information that has already been calculated using only one random variable as a condition set, the one random variable is deleted from the candidate condition set and calculated If the conditional mutual information amount is larger than the conditional mutual information amount that has already been calculated using only the other random variable as a condition set, the other random variable is deleted from the candidate condition set.
- the information processing apparatus calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set as a condition set, and at least one of the calculated conditional mutual information amounts is the first. If the threshold value is less than 1, the process ends without adding an edge between the two random variables.
- the information processing apparatus further includes means for adding an edge between the two random variables when the above processing does not end.
- the present invention is a program that causes a computer to operate like the information processing apparatus described above.
- Bayesian network structure learning can be performed at high speed in calculation time. Therefore, according to this invention, the industrial application range of a Bayesian network can be expanded.
- FIG. 1 is a block diagram of an information processing apparatus for performing Bayesian network structure learning according to an embodiment of the present invention.
- FIG. It is a figure which shows the example of the information contained in input data. It is a flowchart which shows the process by the information processing apparatus of this invention.
- FIG. 4 is a flowchart showing the process of step 310 of FIG. 3 in more detail. It is a figure which shows the pseudo code of Apriori.
- FIG. 10 illustrates an example of a genFreqItemSet1 routine. It is a figure which shows an example of the calcMutualInformation routine for the mutual information calculation in this invention at the time of adopting Apriori as an example of a frequent item set extraction algorithm, and incorporating this.
- FIG. 11 is a diagram illustrating an example of an edgeNeeded_H routine that is called in the thickening routine of FIG. 8 and is used for the processes of FIGS. 9 and 10. It is a figure which shows an example of the edgeNeededBody routine called in edgeNeeded_H routine. It is a figure which shows an example of the edgeNeededBody routine called in edgeNeeded_H routine.
- FIG. 4 is a flowchart showing the process of step 310 of FIG. 3 in more detail in the second embodiment of the present invention. It is a figure which shows an example of the thickening routine in 2nd Example of this invention. 12 is a flowchart showing details of main processes executed in a thickening routine in the second embodiment of the present invention.
- FIG. 26 is a diagram showing an example of an edgeNeeded_H routine that is called in the thickening routine of FIG. 24 and used for the processing of FIG. 25 in the second embodiment of the present invention.
- FIG. 10 is a diagram illustrating an example of a SearchCutSet routine called in the edgeNeeded_H routine and the edgeNeeded routine in the second embodiment of the present invention. It is a flowchart which shows the detail of the edge reduction process in 2nd Example of this invention. It is a figure which shows an example of the Thinning routine in 2nd Example of this invention. In the second embodiment of the present invention, it is a diagram showing an example of an edgeNeeded routine called in the Thinning routine.
- FIG. 10 is a diagram showing an example of an orientationEdge routine used for edge orientation in the second embodiment of the present invention.
- FIG. 1 shows a block diagram of an information processing apparatus 100 for executing Bayesian network structure learning according to an embodiment of the present invention.
- the control unit 102 is a part that controls the flow of processing of the information processing apparatus 100 as a whole. As preprocessing, the control unit 102 checks whether arguments and parameters specified at the start of structure learning are normal. When these are normal, the control unit 102 causes the data specification analysis unit 104 to analyze the data specification. Thereafter, the control unit 102 causes the structure learning unit 110 to execute the main processing of the algorithm.
- the data specification analysis unit 104 reads the data specification description file 108 and prepares to analyze data that is the main input.
- the main input data is, for example, data in a tabular format expressed in a CSV format, a relation in a relational database, or the like.
- the data is input to the information processing apparatus 100 by a user and stored in the database 106 in the information processing apparatus 100, for example.
- the data may be stored in a database on a communication network that is connected to the information processing apparatus 100 in a wired or wireless manner, and information processing is performed in response to reception of a command that requests Bayesian network structure learning.
- the device 100 may access the database and receive data.
- the data store for storing data may be a file, a relational database, or a two-dimensional array on a memory. In the present embodiment, the following description will be given assuming that it is a relational database.
- Each column is composed of “ID” and “each random variable name”, and each row is composed of a corresponding “ID” and “state (realized value) of each random variable”.
- the types of coupons used by the customer are two types T1 and T2 (these cannot be used together).
- the case where the coupon is not used is represented by n
- the product purchased by the customer is represented by y
- the product which has not been purchased by n is represented by purchase data for six customers.
- the data specification description file 108 is a file including information on the random variables included in the data and the state (realized value) of each random variable.
- the data specification description file 108 is, for example, a CSV format file.
- n When the number of states of the random variable is n, description is made in each row as a random variable name, state 1, state 2,..., State n. Is done.
- the random variable representing the purchase behavior history data of the customer and the actual value thereof are described in the data specification description file 108 as follows. Coupon, T1, T2, n A, y, n B, y, n C, y, n D, y, n
- the data specification analysis unit 104 reads the data specification description file 108, names each random variable, the number of random variables, the state name of each random variable, the number of states of each random variable, and the entire data. It holds information about the number of cases and provides this information to other components.
- the structure learning unit 110 is a part that executes the Bayesian network structure learning algorithm proposed in the present application, and a specific operation thereof will be described in detail below.
- the query management unit 112 calculates a mutual information amount and a conditional mutual information amount from the input data. In order to calculate the probability distribution necessary for these calculations, it is necessary to issue to the database 106 a database query that counts the number of data corresponding to the condition. As will be described in detail below, the present invention, in one embodiment, avoids performing calculations that are less necessary using a frequent item set extraction algorithm when calculating mutual information and conditional mutual information. Thus, the whole process is speeded up. Further, while the algorithm is being executed, the number of data with the same condition is referred to a plurality of times. However, since obtaining the query result is a relatively time-consuming process, it is inefficient to issue a query each time.
- the query management unit 112 passes the condition corresponding to the query result acquired once to the query result cache unit 114 to hold it.
- the query management unit 112 queries the query result cache unit 114 when it is necessary to acquire the number of data items, uses the result if the result has already been acquired, and issues the query if the result has not been obtained yet.
- the number of data items is acquired from the database 106.
- the query result cache unit 114 has, as an internal data structure, a hash table in which the query search condition is a key and the number of corresponding data items as a query result is a value, and holds the query result.
- the query result cache unit 114 has a function of responding to a query from the query management unit 112 as to whether or not a query result corresponding to the search condition is already held, and a function of holding a new key / value pair. Have.
- the cutting set holding unit 116 calculates the conditional mutual information amount between the random variable pairs, the probability variable pair and the variable set of the conditional part in which the conditional mutual information amount is less than the threshold ⁇ are records, It has a function of holding a global cutting set as an element. The cut set is required for directing undirected edges.
- the graph structure construction unit 118 is a part having a function of constructing the graph structure of the Bayesian network estimated by the structure learning unit 110.
- the Bayesian network structure description file 120 is a file having information on the structure of the Bayesian network estimated by the information processing apparatus 100. For example, when an estimated edge is detected and becomes a directed edge, it is expressed as “parent variable name ⁇ child variable name”, and the direction of the edge cannot be detected and the undirected edge is detected. In such a case, it is expressed as “variable name 1 ⁇ variable name 2”.
- the coupon is a parent variable of A and D
- A is a parent variable of B
- the output Bayesian network structure description file 120 includes the following contents. Coupon ⁇ A Coupon ⁇ D A ⁇ B BC
- Bayesian network structure learning can be performed at a higher speed and with less processing time variation than conventional TPDA. This will be described in detail.
- FIG. 3 is a flowchart showing processing by the information processing apparatus 100 according to the present embodiment.
- the information processing apparatus 100 starts processing (step 302).
- the instruction is configured to include predetermined operation parameters including connection information and a data specification description file name for accessing the database 106 that stores data serving as a basis for structure learning.
- the operation parameters include a mutual information amount used in structure learning and a conditional mutual information threshold value ⁇ (as an example, 0.01) and a minimum support level ⁇ (used in frequent item set extraction). 0 ⁇ ⁇ ⁇ ⁇ , for example, 0.0005).
- the file name of the output Bayesian network structure description file may be included.
- the information processing apparatus 100 performs initial processing.
- the control unit 102 checks whether or not the operation parameter is normal (step 304). If there is an error, the control unit 102 terminates the processing (step 320). If normal, the data specification is sent to the data specification analysis unit 104. Analysis is performed (step 306).
- the data specification analysis unit 104 reads the data specification description file 108 and stores the names of the random variables, the number of random variables, the names of all the states that can be taken by the random variables, and the number of states. Next, the data specification analysis unit 104 connects to the database 106 using the database connection information, acquires the number of all data, and holds it (step 308). After step 308, the control unit 102 transfers control to the structure learning unit 110.
- the structure learning unit 110 performs a tree structure preparation process and generates a tree structure for the given data (step 310).
- the process of step 310 is shown in more detail in FIG.
- the frequent item set extraction algorithm is a data for rapidly extracting an item set whose support level (that is, the joint probability that a certain item set appears) is equal to or higher than the minimum support level ⁇ among the item sets appearing in the data.
- a general term for mining algorithms is the inverse monotonicity of the support of the item set, that is, when A and B are the item sets, Then, using the property that (support level of A) ⁇ (support level of B) (that is, if A is not a frequent item set, the set B including A is not a frequent item set) It is an algorithm that efficiently extracts frequent itemsets by branching.
- Apriori As an example of a multi-frequency item set extraction algorithm, Apriori (Agrawal, R. and Srikant , R .: Fast Algorithms for Mining Association Rules, in Proc. Of the 20 th Int'l Conference on Very Large Databases, pp. 487-499 , Santiago, Chile (1994)). Apriori pseudo code is shown in FIG.
- a pair of random variables and their values is regarded as an item set, and the expression (2) expressing the mutual information is Among the components constituting the right side, the component related to the set of the joint random variable having the minimum support degree ⁇ and its value is excluded from the calculation target.
- mutual information I (X, Y) of different random variables X and Y is expressed by the following equation.
- the query management unit 112 calculates peripheral probabilities for all of the possible states for each random variable.
- a set of random variables and states (represented as ⁇ X, x>, ⁇ Y, y>, etc.) whose marginal probability is greater than or equal to the minimum support degree ⁇ is added to the frequent item set F 1 of size 1.
- the probability variable and state at that time are used as search condition keys, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114.
- An example of the genFreqItemSet1 routine for performing this procedure in the query management unit 112 is shown in FIG.
- the query management unit 112 calculates the mutual information amount for all the random variable pairs, and adds a pair of random variables whose mutual information amount is ⁇ or more to the random variable pair array. At this time, the query management unit 112 uses the above-described frequent item set extraction algorithm to speed up the calculation process.
- the query management unit 112 sets a pair (for example, ⁇ X, x 1 >, ⁇ Y, for all y 1>), in each set of elements (here, ⁇ X, x 1> and ⁇ Y, both of y 1>) is included in the frequent item set F 1 (i.e., the periphery of the case It is determined whether the probabilities are all equal to or greater than the minimum support degree ⁇ (step 404). When at least one is not included in the set F 1 (less than the minimum support degree ⁇ ) (“No” in Step 404), the right side of Equation (4) (Mutual information component) is not calculated.
- step 404 it is determined whether the joint probability P (x, y) at this time is equal to or greater than ⁇ (step 406). If the joint probability is greater than or equal to ⁇ (“Yes” in step 406), the mutual information component (formula (6)) at this time is calculated (step 408). Further, the current set of random variables and states (for example, ⁇ X, x 1 >, ⁇ Y, y 1 >) is used as a search condition key, and the number of data corresponding to the search condition is set as a value in the query result cache unit 114.
- the current set of random variables and state frequent itemsets F 2 size 2.
- the query management unit 112 repeats Step 404 to Step 408 for all the sets (Step 410), and then adds the components of the mutual information calculated so far to the random variable pair currently focused on.
- a mutual information amount is obtained (step 412).
- the structure learning unit 110 adds the random variable pair to the random variable pair array.
- Steps 404 to 412 are repeated for all random variable pairs, and mutual information is calculated for all random variable pairs (step 414).
- the structure learning unit 110 sorts the random variable pairs stored in the random variable pair array in descending order of mutual information (step 416). Then, the graph structure construction unit 118 is inquired as to whether the graph structure remains a tree structure even if an edge is added between the random variable pairs in the order of the random variable pairs having the larger mutual information amount (step 418). The graph structure construction unit 118 notifies the structure learning unit 110 that a tree structure is not generated when an edge is added to the structure learning unit 110 (“No” in step 418). On the other hand, when the graph structure construction unit 118 notifies that the tree structure is still maintained (“Yes” in step 418), the structure learning unit 110 adds an undirected edge between the current random variable pair of interest. The graph structure construction unit 118 is instructed to delete the random variable pair from the random variable pair array (step 420). Steps 418 and 420 are repeated for all random variable pairs in the random variable pair array (step 422).
- step 310 for each random variable pair whose mutual information amount is ⁇ or more, even if an edge is added between the random variable pairs, the graph structure remains a tree structure.
- a tree-structured graph structure is generated by adding edges.
- the marginal probability of each random variable included in the input data is calculated, and if the marginal probability that takes a state with any random variable constituting the random variable pair is less than ⁇ , By omitting the calculation of the information amount component, the mutual information amount of each random variable pair is calculated.
- FIG. 7 shows an example of a calcMutualInformation routine for calculating mutual information in this embodiment when Apriori is adopted as an example of the frequent item set extraction algorithm and incorporated.
- the structure learning unit 110 executes an edge increasing process in step 312. Although the mutual information amount is ⁇ or more, the structure learning unit 110 does not become a tree structure when an undirected edge is added. , The random variable pair remaining in the random variable pair array) is determined by using conditional mutual information to determine whether or not an edge is actually needed. Add an edge.
- An example of the thickening (edge increase) routine at this time is shown in FIG.
- FIG. 9 shows details of main processes executed in the thickening routine.
- the structure learning unit 110 configures the pair for each random variable pair whose mutual information amount is ⁇ or more but does not have an undirected edge (that is, the random variable pair remaining in the random variable pair array).
- the final condition is a set of nodes that exist on a path that has one node of one random variable (for example, X, Y) node as a start point and the other node as an end point and is adjacent to one of these two random variable nodes.
- Set as a set (ConditionSet, C (bold)) (step 902).
- a candidate condition set C ′ (bold) having the same random variable set as the final condition set is generated.
- the structure learning unit 110 includes one random variable (ie, between X and Y) included in the final condition set (for example, ⁇ C 1 , C 2 , C 3 , C 4 ,... ⁇ ).
- the query management unit 112 calculates the conditional mutual information for one of the random variables adjacent to X or Y) on the path of (1).
- the query management unit 112 first determines the conditional mutual information I (() when the condition set includes only one random variable (for example, C 1 ) among the random variables included in the final condition set. X, Y
- the query management unit 112 includes one random variable (here, C 1 ) and another random variable (one of C 2 , C 3 , C 4 ...) In the condition set. Is included, a conditional mutual information amount is calculated (step 906).
- the structure learning unit 110 determines whether the calculated conditional mutual information is less than ⁇ (step 908). If it is less than ⁇ (“Yes” in step 908), the pair of the random variable pair ( ⁇ X, Y ⁇ ) and the condition set (for example, ⁇ C 1 , C 2 ⁇ ) is stored in the cut set holding unit 116. Store in the global disconnected set. Then, it is determined that no edge is required between the random variable pairs (step 910). If it is equal to or greater than ⁇ (“No” in step 908), it is determined whether the calculated conditional mutual information amount is larger than the current minimum conditional mutual information amount (step 912).
- the structure learning unit 110 deletes the other one random variable from the candidate condition set C ′ (bold) (step 914). If it is small (“No” in Step 912), the structure learning unit 110 stores the conditional mutual information amount at this time as the minimum mutual information amount, and the conditional set at this time has the minimum conditional mutual information amount. (Step 916).
- the query management unit 112 determines that only one of the random variables other than the one of the above random variables of interest (C 1 in the above example) among the random variables remaining in the candidate condition set is a condition. For the cases included in the set, a conditional mutual information amount is calculated (step 918). If the calculated conditional mutual information amount is smaller than the current minimum conditional mutual information amount, the conditional mutual information amount at this time is stored as the minimum mutual information amount. The condition set at this time is stored as a condition set that minimizes the amount of conditional mutual information (step 920).
- the query management unit 112 calculates a conditional mutual information amount in the case where one random variable currently focused on and one of the other random variables remaining in the candidate condition set are included in the condition set. (Step 922).
- the structure learning unit 110 determines the global variable in the cut set holding unit 116 as a combination of the random variable pair and the condition set at this time. Remember to cut set. Then, it is determined that no edge is required between the random variable pairs (step 910). If the calculated conditional mutual information amount is equal to or larger than ⁇ (“No” in step 924), it is determined whether the value is larger than the value calculated in step 918 (step 926).
- step 926 If larger (“Yes” in step 926), the other one random variable is deleted from the candidate condition set (step 928). If it is small (“No” in step 926), the conditional mutual information amount at this time is stored as the minimum mutual information amount, and the condition set at this time is stored as a conditional set that minimizes the conditional mutual information amount ( Step 930).
- the query management unit 112 uses all the random variables remaining in the candidate condition set as the condition set to perform conditional mutual information.
- the amount is calculated (step 1004). It is determined whether the calculated conditional mutual information is less than ⁇ (step 1006). If it is less than ⁇ (“Yes” in step 1006), the structure learning unit 110 stores the pair of the random variable pair and the condition set at this time in the global cut set in the cut set holding unit 116 (step 1008). ), It is determined that no edge is required between the pair of random variables. If it is equal to or greater than ⁇ (“No” in step 1006), the structure learning unit 110 determines that an edge is necessary between the random variable pair (step 1010).
- conditional mutual information is calculated while increasing the size of the condition set from 1 to 2, and the conditional mutual information that is still less than ⁇ cannot be obtained. Calculated conditional mutual information using all the random variables remaining in the candidate set as a condition set, and determined whether or not this was less than ⁇ .
- the size of the condition set may be further increased to 3, 4 and processing up to a size equal to or smaller than the final condition set size may be performed as shown in FIG.
- step 312 for each random variable pair in which the mutual information amount is ⁇ or more but no edge is added in step 310, an edge is added if necessary.
- a condition set is a set of random variables included in a set of nodes on a path between them and adjacent to either of the two random variable nodes. If a pair having a value less than ⁇ is found, an edge is not added between the two random variables, otherwise an edge is added.
- a set of nodes on the path between the two random variable nodes and adjacent to one of the two random variable nodes is a final condition set, and the two random variable nodes and the condition set
- the conditional mutual information about C (bold) is calculated while increasing the size from the case where the size of the condition set C (bold) is 1 to the size of the final condition set. If a conditional mutual information amount that is less than ⁇ is obtained in the process, an edge is not added between the two random variables, and an edge is added if it is determined that it is not necessary.
- the joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is less than ⁇ , the calculation of the related components is omitted. To do.
- FIG. 11 shows an example of the edgeNeeded_H routine called in the thickening routine of FIG. 7 and used in the processing of FIGS. 9 and 10
- FIG. 12A and FIG. 12B show an example of the edgeNeededBody routine called in the edgeNeeded_H routine.
- the routine in FIG. 12A is configured to calculate the conditional mutual information by increasing the size of the condition set C (bold) in order from 1 to 2.
- the routine of FIG. 12B is configured to perform calculation while increasing the size of the condition set from 1 to the size of the final condition set.
- conditional mutual information is calculated using a set of various random variables as a condition set.
- the conditional mutual information when the size of the conditional set is small (size is 1) is calculated, and then the conditional mutual information is calculated while increasing the size of the conditional set. Yes. Then, the calculation is repeated until a condition set in which the conditional mutual information amount is less than the threshold ⁇ is found, and if it is found, it is determined that an edge is not necessary between the random variable pair of interest, and if it is not found Determines that an edge is necessary.
- a combination of a random variable and its value is used.
- C (bold)) is expressed by the following equation.
- Equation (7) if P (x, y, c (bold)) is less than the minimum support degree ⁇ , the component constituting the right side of Equation (7) is also less than ⁇ . As already described, since 0 ⁇ ⁇ ⁇ ⁇ , at this time, for the component, Is established. Therefore, it can be seen that the value is less than ⁇ without directly calculating each component on the right side of Equation (7).
- FIG. 13 shows a processing flow when the number (
- the query management unit 112 calculates P (c (bold)) (c (bold) is a state set corresponding to a random variable). It is determined whether this value is less than ⁇ (step 1102). If it is less than ⁇ (“Yes” in step 1102), the query management unit 112 determines whether the conditional mutual information component (the right side of Expression (7)) Is not calculated. If it is equal to or greater than ⁇ (“No” in step 1102), it is determined whether P (x
- step 1106 it is determined whether P (y
- step 1110 it is determined whether P (x, y, c (bold)) is less than ⁇ (step 1112). If it is less than ⁇ (“Yes” in step 1112), the conditional mutual information component is not calculated. If it is equal to or greater than ⁇ (“No” in step 1112), the query management unit 112 calculates a component of conditional mutual information about the current set of random variables and states (step 1114).
- step 1116 The state of one random variable (for example, X) in the current random variable pair is fixed, and steps 1106 to 1114 are repeated for all possible states of the other random variable (for example, Y) (step 1116). Further, after repeating steps 1104 to 1116 for all possible states of the fixed random variable (X) (step 1118), the components calculated so far are added to obtain a conditional mutual information amount (step 1120).
- FIG. 14 shows a processing flow when the number of random variables (
- ⁇ 1 variable are generated from the condition set (step 1402).
- the query management unit 112 inquires whether all combinations are stored in the query result cache unit 114 as search condition keys (step 1404). If any combination is not stored (“No” in step 1404), the following processing is not performed and the process is terminated (step 1406).
- P (c (bold)) (c (bold) is a state value set corresponding to each random variable) is ⁇ . It is determined whether or not it is less than (step 1408). If it is less than ⁇ (“Yes” in step 1408), the query management unit 112 is not allowed to calculate the conditional mutual information component. If it is greater than or equal to ⁇ (“No” in step 1408), the probability variable at this time is used as a search condition key, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114 (step 1410). Also, the set of random variables and states is added to the frequent item set F n having a size equal to the number n of random variables.
- step 1412 it is determined whether P (x
- step 1414 If it is not 0 (“No” in step 1414), all subsets obtained by a set consisting of two random variables constituting the current random variable pair and the current condition set and having a size smaller by 1 than that set. Is considered (step 1416), and it is determined whether all the subsets are included in the frequent item set (F 1 , F 2 ,...) (Step 1418). If not included (“No” in step 1418), the conditional mutual information component is not calculated. If included (“Yes” in step 1418), it is determined whether P (x, y, c (bold)) is less than ⁇ (step 1420). If it is less than ⁇ (“Yes” in step 1420), the conditional mutual information component is not calculated.
- the probability variable at this time is used as a search condition key, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114 (step 1422). Also, the set of random variables and states is added to the frequent item set F n having a size equal to the number n of random variables.
- a conditional mutual information component is calculated for the current set of random variables and states (step 1424). Further, the state of one random variable (X) in the current random variable pair is fixed, and Steps 1414 to 1424 are repeated for all possible states of the other random variable (Y) (Step 1426). Steps 1412 to 1426 are repeated for all possible states of the fixed random variable (X) (step 1428). Finally, all the components of the conditional mutual information calculated so far are added to obtain the conditional mutual information (step 1430).
- FIGS. 15 and 16 An example of the calcConditionalMI routine and the haveValidCandidate routine used in the processing of FIGS. 13 and 14 is shown in FIGS. 15 and 16, respectively.
- the information processing apparatus 100 performs edge reduction processing in step 314. Detailed processing is shown in FIG.
- the structure learning unit 110 inquires of the graph structure construction unit 118 whether there is another path between the random variable pairs for each random variable pair having an undirected edge (step 1702). When a notification that there is no other path is received from the graph structure construction unit 118 (“No” in Step 1702), the process proceeds to Step 1702 for another random variable pair. When a notification that there is another path is received from the graph structure construction unit 118 (“Yes” in step 1702), the structure learning unit 110 temporarily deletes the undirected edge between the random variable pairs. The construction unit is instructed (step 1704). Then, the processing shown in FIGS. 9 and 10 is executed for the random variable pair, and it is determined whether or not an edge is necessary between the random variable pairs (step 1706).
- Step 1706 When an edge is necessary (“Yes” in step 1706), the structure learning unit 110 instructs to add the undirected edge temporarily deleted again between the random variable pairs (step 1708). When the edge is not necessary (“No” in Step 1706), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. Steps 1702 to 1708 are repeated for all random variable pairs having undirected edges when the edge increasing process 312 is completed (step 1710).
- the structure learning unit 110 has, for each random variable pair having an undirected edge at the present time, at least one random variable node constituting the pair has three or more adjacent nodes other than the other node.
- the graph structure construction unit 118 is inquired about this (step 1712).
- the graph structure construction unit 118 receives a notification from the graph structure construction unit 118 that any of the random variable nodes constituting the pair does not have three or more adjacent nodes other than the pair counterpart, the structure learning unit 110 The process proceeds to step 1712 for another random variable pair.
- the structure learning unit 110 Upon receiving notification from the graph structure construction unit 118 that at least one random variable node constituting the pair has three or more adjacent nodes other than the other node (“Yes” in step 1712), the structure learning unit 110 Then, the graph structure construction unit is instructed to temporarily delete the undirected edge between the random variable pairs (step 1714). Then, a set including a node adjacent to one of the two random variable nodes and a node adjacent to the adjacent node on the path between the two random variable nodes constituting the random variable pair, The final condition set is set (step 1716). Then, the processing after step 904 in FIG. 9 and the processing in FIG. 10 are executed for the random variable pair and the final condition set, and it is determined whether or not an edge is necessary between the random variable pairs ( Step 1718).
- the structure learning unit 110 instructs to add the undirected edge temporarily deleted again between the random variable pairs (Step 1720). If the edge is not necessary (“No” in step 1718), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted.
- the processing of steps 1702 to 1710 is completed, the processing of steps 1712 to 1720 is repeated for all random variable pairs having undirected edges (step 1722).
- step 314 it is determined whether or not an edge is necessary for each random variable pair having an edge after the processing in step 312, and an edge is added if necessary. At that time, if there is a path other than the undirected edge for a random variable pair having an undirected edge, the undirected edge is temporarily deleted, and the random variable pair is necessary in the same manner as in step 312. If it is, the undirected edge temporarily deleted is added. Furthermore, when at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted.
- a set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on the path between the two final random variable nodes is finalized.
- the undirected edge temporarily deleted is added when necessary as in the case of step 312.
- Thinning (edge reduction) routine is shown in FIG. 18, and an example of the edgeNeeded routine called in the routine is shown in FIG.
- the information processing apparatus 100 executes an orientation process for an undirected edge included in the graph structure generated in the graph structure construction unit 118 through the operations up to the edge reduction process 314 (step 316). . Detailed processing is shown in FIG.
- the structure learning unit 110 inquires of the cut set holding unit 116 for such a set of random variables, and 1) an element having ⁇ X, Z ⁇ as a record (for example, ⁇ X, Z ⁇ , C>) is global.
- step 2002 the structure learning unit 110 instructs the graph structure building unit 118 to direct the undirected edge so that X and Z are parents of Y (step 2004). If not ( Step "No” in 2002), the process of step 2002 for another three random variables set to satisfy the above relationship.
- the structure learning unit 110 sets the three random variable sets (here, Focus on X, Y and Z).
- the structure learning unit 110 inquires of the graph structure construction unit 118, 1) X is a parent of Y, 2) Y and Z are adjacent, and 3) X and Z are not adjacent. And 4) An inquiry is made as to whether all the conditions that the edge between Y and Z is an undirected edge are satisfied (step 2008). If the condition is satisfied (“Yes” in step 2008), the structure learning unit 110 instructs the graph structure construction unit 118 to direct the edge so that Y becomes the parent of Z (step 2010). If the condition is not satisfied (“No” in step 2008), the process in step 2008 is executed for another set of three random variables included in the vertex set. Steps 2008 and 2010 are performed for all of the three random variable sets included in the vertex set (step 2012).
- the structure learning unit 110 inquires of the graph structure construction unit 118 about whether there is a valid path between the random variable pairs for the random variable pair having an undirected edge at the present time (step 2014). If it exists (“Yes” in step 2014), the structure learning unit 110 instructs the graph structure construction unit 118 to direct the edge so that X becomes the parent of Y (step 2016). Steps 2014 and 2016 are performed for all undirected edges (step 2018).
- FIG. 1 An example of the orientEdge routine used for edge orientation is shown in FIG.
- the graph structure held by the graph structure construction unit 118 after the execution of step 316 in FIG. 3 represents a Bayesian network structure learned through a series of processes of the present embodiment.
- the information processing apparatus 100 outputs this as the Bayesian network structure description file 120 (step 318) and completes the process (step 320).
- the information processing apparatus 100 described here includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the learning device of the present invention includes a plurality of control units 102, data specification analysis unit 104, structure learning unit 110, query management unit 112, query result cache unit 114, disconnected set holding unit 116, and graph structure construction unit 118.
- the functions may be configured to execute on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor).
- the information processing apparatus 100 may include a database 106.
- the present invention has been described as being implemented as the information processing apparatus 100.
- the present invention can be realized as a program that causes a computer to operate as a component shown in FIG.
- the present invention can be realized as a program that causes a computer to execute the steps illustrated in FIG.
- This embodiment is characterized in that Bayesian network structure learning is performed using a new algorithm in which a frequent item set extraction algorithm is combined with a constraint-based learning algorithm.
- the technical idea of the present invention has been described using a specific constraint-based learning algorithm and a frequent item set extraction algorithm for specific explanation.
- the technical idea of the present invention can be realized using other constraint-based learning algorithms and frequent item set extraction algorithms.
- the information processing apparatus and program according to the present embodiment do not simply use a combination of the above algorithms. In other words, it does not operate with a simple combination in which the output obtained by using the frequent item set extraction algorithm is a joint probability distribution extracted by the algorithm, and this is simply passed as input to the constraint-based learning algorithm. .
- the present invention uses a unique algorithm in which a frequent item set extraction algorithm is incorporated inside a constraint-based learning algorithm. And this invention operate
- the number of random variables given as a condition for conditional mutual information does not always increase monotonically as the algorithm progresses.
- the number of items to be handled in the present invention, a set of random variables and their values
- the present invention realizes effective incorporation of a frequent item set extraction algorithm by changing the constraint-based learning algorithm so that the number of random variables to be handled keeps monotonically increasing locally.
- the Bayesian network that is the basis of the experiment is a 37-node, 46-edge system called Alarm from the Bayesian network repository (http://compbio.cs.huji.ac.il/Repository) frequently used as an example of Bayesian network learning. Used the network.
- the Bayesian network used for the experiment is shown in FIG. Generate 5000, 15000, 30000, and 50000 data for the network, respectively, and use these data as inputs to operate the computer as an information processing device of the present embodiment according to the conventional TPDA algorithm and the algorithm of the present embodiment. By doing this, Bayesian network structure learning was performed.
- Tables 1 to 4 show the experimental results representing the average values and the like when five data sets are executed for each of the above data numbers. Missing Edge and Extra Edge indicate the number of edges lost in the estimated Bayesian network and the number of extra edges added, respectively, when compared to the correct Bayesian network.
- the execution time and standard deviation for each algorithm when the execution time when using the conventional TPDA algorithm is set to 1 and the execution time when the number of data items for each algorithm is 5000 are 1 The execution time was shown.
- Bayesian network structure learning can be greatly speeded up and the variation in execution time can be greatly reduced as compared with the case where the conventional TPDA algorithm is used. It can also be seen that the error from the correct network indicated by Missing Edge and Extra Edge is suppressed to a level comparable to that of the prior art. As described above, the present embodiment has an excellent effect that the speed of the structure learning and the stabilization of the execution time can be realized without sacrificing the accuracy of the estimated Bayesian network as compared with the conventional technique. .
- the configuration of the information processing apparatus 100 of the present embodiment is shown in FIG. Further, the basic processing by the information processing apparatus 100 in this embodiment is as shown in FIG.
- the information processing apparatus 100 starts processing (step 302).
- the instruction is configured to include connection information and a data specification description file name for accessing the database 106 that stores data serving as a basis for Bayesian network structure learning.
- the operation parameters include a mutual information amount used in Bayesian network structure learning and a conditional mutual information threshold value ⁇ (for example, 0.01).
- the file name of the output Bayesian network structure description file may be included.
- the information processing apparatus 100 performs initial processing.
- the control unit 102 checks whether or not the operation parameter is normal (step 304). If there is an error, the control unit 102 terminates the processing (step 320). If normal, the data specification is sent to the data specification analysis unit 104. Analysis is performed (step 306).
- the data specification analysis unit 104 reads the data specification description file 108 and holds the names of the random variables, the number of random variables, the names of all the states that can be taken by the random variables, and the number of states. Next, the data specification analysis unit 104 connects to the database 106 using the database connection information, acquires the number of all data, and holds it (step 308). After step 308, the control unit 102 transfers control to the structure learning unit 110.
- the structure learning unit 110 performs a tree structure preparation process and generates a tree structure for the given data (step 310).
- the process of step 310 is shown in more detail in FIG.
- the query management unit 112 calculates mutual information for all random variable pairs (step 2302).
- the structure learning unit 110 adds the random variable pair to a random variable pair array stored in a storage unit (not shown) in the information processing apparatus 100 or the like.
- the structure learning unit 110 sorts the random variable pairs stored in the random variable pair array in descending order of mutual information (step 2304). Then, the graph structure construction unit 118 is inquired as to whether or not the graph structure remains a tree structure even if an edge is added between the random variable pairs in the order of the random variable pairs having the larger mutual information amount (step 2306). The graph structure construction unit 118 notifies the structure learning unit 110 that a tree structure is not generated when an edge is added to the structure learning unit 110 (“No” in step 2306). On the other hand, when the graph structure construction unit 118 notifies that the tree structure is maintained even if an edge is added (“Yes” in step 2306), the structure learning unit 110 determines whether the current attention is placed between the random variable pairs. The graph structure construction unit 118 is instructed to add an undirected edge, and the random variable pair is deleted from the random variable pair array (step 2308). Steps 2306 and 2308 are repeated for all random variable pairs in the random variable pair array (step 2310).
- the graph structure remains a tree structure.
- a tree-structured graph structure is generated by adding edges.
- the generated graph structure may be stored in the graph structure construction unit 118, a storage unit (not shown) in the information processing apparatus 100, the structure learning unit 110, or the like.
- the structure learning unit 110 executes an edge increasing process in step 312. Although the mutual information amount is equal to or larger than ⁇ , the structure learning unit 110 does not become a tree structure when an undirected edge is added, and thus the random variable pair in which the undirected edge is not added in the tree structure preparation process (ie, , Random variable pairs remaining in the random variable pair array), whether or not an edge is actually needed is determined by using conditional mutual information, and if it is determined to be necessary, the probability Add undirected edges between variable pairs.
- An example of the thickening (edge increase) routine at this time is shown in FIG.
- FIG. 25 shows details of main processes executed in the thickening routine in this embodiment.
- the structure learning unit 110 configures the pair for each random variable pair whose mutual information amount is ⁇ or more but does not have an undirected edge (that is, the random variable pair remaining in the random variable pair array).
- the final condition is a set of nodes that exist on a path that has one node of one random variable (for example, X, Y) as a start point and the other node as an end point and is adjacent to one of these two random variable nodes. It is set as a set (ConditionSet, expressed as Z (bold) in this embodiment) (step 2402).
- a candidate condition set Z c (bold) having the same random variable set as the final condition set is generated.
- the structure learning unit 110 includes one random variable (ie, between X and Y) included in the final condition set (for example, ⁇ Z 1 , Z 2 , Z 3 , Z 4 ,... ⁇ ).
- the query management unit 112 calculates the conditional mutual information for one of the random variables adjacent to X or Y) on the path of (1).
- the query management unit 112 performs conditional mutual processing on the assumption that only one random variable (for example, Z 1 ) among random variables included in the final condition set Z (bold) is included in the condition set.
- Z (bold)) is calculated (step 2404).
- the calculated conditional mutual information is associated with the condition set used in the calculation (including only Z 1 in this case) and stored in the query management unit 112 or the storage unit (not shown) of the information processing apparatus 100. It may be stored.
- the structure learning unit 110 determines whether or not the conditional mutual information calculated in step 2404 is less than ⁇ (step 2406). If it is less than ⁇ (“Yes” in Step 2406), the pair of the random variable pair ( ⁇ X, Y ⁇ ) and the condition set (for example, ⁇ Z 1 ⁇ ) at this time is stored in the cut set holding unit 116. Stored in a global disconnected set. Then, it is determined that no edge is required between the random variable pairs (step 2408).
- step 2404 determines whether the conditional mutual information amount calculated using one random variable as a condition set in step 2404 is equal to or larger than ⁇ (“No” in step 2406). If the conditional mutual information amount calculated using one random variable as a condition set in step 2404 is equal to or larger than ⁇ (“No” in step 2406), the process returns to step 2404, and another probability is obtained.
- a conditional mutual information amount is calculated when only a variable (eg, Z 2 ) is included in the condition set. Thereafter, steps 2404 and 2406 are repeated in the same manner. That is, the calculation of the conditional mutual information with a given random variable included in the final condition set Z (bold) is repeated.
- the calculated mutual mutual information amount may be stored in the query management unit 112 or the storage unit (not shown) of the information processing apparatus 100 in association with the condition set used for the calculation. If a random variable whose conditional mutual information is less than ⁇ is found in this process (“Yes” in step 2406), the process proceeds to step 2408.
- any conditional mutual information calculated by the execution of steps 2404 and 2406 is greater than or equal to ⁇ , the process proceeds to step 2410.
- the query management unit 112 performs conditional mutual processing for a case where two random variables (for example, Z 1 and Z 2 ) among random variables included in the final condition set Z (bold) are included in the condition set.
- Z (bold)) is calculated (step 2410).
- the calculated conditional mutual information is associated with the condition set used in the calculation (here, ⁇ Z 1 , Z 2 ⁇ ), and the storage unit (not shown) of the query management unit 112 or the information processing knowledge 100. Or the like.
- the structure learning unit 110 determines whether the conditional mutual information calculated in step 2410 is less than ⁇ (step 2412). When this is less than ⁇ (“Yes” in step 2412), the set of the random variable pair ( ⁇ X, Y ⁇ ) and the condition set (for example, ⁇ Z 1 , Z 2 ⁇ ) at this time is cut into a cut set holding unit. Store in the global cutting set in 116. Then, it is determined that no edge is required between the random variable pair (here, X and Y) (step 2408).
- the structure learning unit 110 uses the conditional mutual information amount in the calculation of step 2410. It is determined whether one of the two random variables included in the condition set (for example, Z 1 ) is larger than the conditional mutual information already calculated in step 2404 using the condition set (step 2414). If larger (“Yes” in step 2414), the structure learning unit 110 selects one of the random variables (in this case, Z 1 ) from the candidate condition set Z c (bold) stored in the storage unit (not shown) or the like. ) Is deleted. The structure learning unit 110 performs the same process on the other (for example, Z 2 ) of the two random variables included in the condition set used in the calculation of Step 2410 (Steps 2418 and 2420).
- the structure learning unit 110 repeats Steps 2410 to 2420 for all random variables remaining in the candidate condition set Z c (bold) (Step 2422). For example, when the initial candidate condition set Z c (bold) includes six random variables Z 1 to Z 6 , the candidate condition set Z c (bold) to Z 1 and When Z 2 is deleted, the processes of steps 2410 to 2420 are executed again for the remaining Z 3 to Z 6 .
- the query management unit 112 calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set Z c (bold) as a result of the above-described processing (step 2424).
- the structure learning unit 110 determines whether the conditional mutual information calculated in step 2424 is less than ⁇ (step 2426). When this is less than ⁇ (“Yes” in step 2426), the pair of the random variable pair ( ⁇ X, Y ⁇ ) and the candidate condition set at this time is stored in the global cut set in the cut set holding unit 116. . Then, it is determined that no edge is required between the random variable pair (here, X and Y) (step 2408).
- the structure learning unit 110 determines that an edge is necessary between the random variable pair, and the graph structure The construction unit 118 is instructed to that effect (step 2428). In this way, an edge is added between a pair of random variables when it is determined that an edge is necessary.
- the generated graph structure may be stored in the graph structure construction unit 118, a storage unit (not shown) in the information processing apparatus 100, the structure learning unit 110, or the like.
- the processing when the size of the condition variable set is 1 at the maximum (first stage, steps 2404, 2406 and 2408). ), Processing when the size of the conditional variable set is 2 (second stage, steps 2410 to 2422 and 2408), and processing for all random variables left in the candidate condition set after the first stage and the second stage Only three stages (third stage, steps 2424-2428 and 2408) are required.
- the conventional TPDA searches for a cut set starting from the larger subset size of a given condition variable set (that is, in descending order), resulting in a maximum of N ⁇ 2 (N is the number of random variables). ) Requires a stage search. Therefore, in the Bayesian network structure learning in a situation where there are a large number of random variables, this embodiment can significantly speed up the processing compared to the conventional TPDA.
- FIG. 26 shows an example of the edgeNeeded_H routine called in the thickening routine of FIG. 24 and used for the processing of FIG. 25, and
- FIG. 27 shows an example of the SearchCutSet routine called in the edgeNeeded_H routine. Details of the routine of FIG. 27 have already been described in relation to FIG.
- the information processing apparatus 100 performs edge reduction processing in step 314. Detailed processing is shown in FIG.
- the structure learning unit 110 inquires of the graph structure construction unit 118 whether there is another path between the random variable pairs for each random variable pair having an undirected edge (step 2702). When a notification that there is no other path is received from the graph structure construction unit 118 (“No” in step 2702), the processing proceeds to step 2702 for another random variable pair. When a notification that there is another path is received from the graph structure construction unit 118 (“Yes” in Step 2702), the structure learning unit 110 temporarily deletes the undirected edge between the random variable pairs. The construction unit 118 is instructed (step 2704). Then, the process shown in FIG. 25 is executed for the random variable pair, and it is determined whether or not an edge is necessary between the random variable pairs (step 2706).
- the processing in step 2706 can be executed in only three stages at the maximum, so that the processing speed can also be increased here as compared with the conventional TPDA. If an edge is necessary (“Yes” in step 2706), the structure learning unit 110 instructs the graph structure construction unit 118 to add the undirected edge temporarily deleted again between the random variable pairs (step). 2708). When the edge is unnecessary (“No” in Step 2706), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. Steps 2702 to 2708 are repeated for all random variable pairs having undirected edges when the edge increasing process 312 is completed (step 2710).
- the structure learning unit 110 has, for each random variable pair having an undirected edge at the present time, at least one random variable node constituting the pair has three or more adjacent nodes other than the other node.
- the graph structure construction unit 118 is inquired about this (step 2712). Upon receiving notification from the graph structure construction unit 118 that none of the random variable nodes constituting the pair has three or more adjacent nodes other than the counterpart of the pair (“No” in step 2712), the structure learning unit 110 The process proceeds to step 2712 for another random variable pair.
- the structure learning unit 110 When the notification that the at least one random variable node constituting the pair has three or more adjacent nodes other than the other node is received from the graph structure construction unit 118 (“Yes” in step 2712), the structure learning unit 110 Then, the graph structure construction unit is instructed to temporarily delete the undirected edge between the random variable pairs (step 2714). Then, a set including a node adjacent to one of the two random variable nodes and a node adjacent to the adjacent node on the path between the two random variable nodes constituting the random variable pair, The final condition set is set (step 2716). Then, the processing after step 2404 in FIG. 25 is executed for the random variable pair and the final condition set, and it is determined whether or not an edge is necessary between the random variable pairs (step 2718).
- step 2718 If an edge is necessary (“Yes” in step 2718), the structure learning unit 110 instructs the graph structure building unit 118 to add the undirected edge temporarily deleted again between the random variable pairs (step). 2720). If the edge is not necessary (“No” in step 2718), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted.
- steps 2702 to 2710 the processing of steps 2712 to 2720 is repeated for all random variable pairs having undirected edges (step 2722).
- step 314 it is determined whether or not an edge is necessary for each random variable pair having an edge after the processing in step 312, and an edge is added if necessary. At that time, if there is a path other than the undirected edge for a random variable pair having an undirected edge, the undirected edge is temporarily deleted, and the random variable pair is necessary in the same manner as in step 312. If it is determined that the undirected edge is temporarily deleted, the undirected edge is added again. Furthermore, when at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted.
- a set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on the path between the two final random variable nodes is finalized.
- the temporarily deleted undirected edge is added again as a necessary condition set when it is determined to be necessary.
- FIG. 29 shows an example of a thinning (edge reduction) routine in this embodiment.
- FIG. 30 shows an example of the edgeNeeded routine that is called in this routine in this embodiment.
- the information processing apparatus 100 executes an orientation process for the undirected edge included in the graph structure generated by the graph structure construction unit 118 through the operations up to the edge reduction process 314 (step 316).
- FIG. 31 shows an example of a routine used for edge orientation.
- the graph structure held by the graph structure construction unit 118 after execution of step 316 in FIG. 3 is a series of this embodiment.
- the information processing apparatus 100 outputs this as the Bayesian network structure description file 120 (step 318) and completes the process (step 320).
- the information processing apparatus 100 described in connection with the present embodiment includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the information processing apparatus according to the present embodiment includes a control unit 102, a data specification analysis unit 104, a structure learning unit 110, a query management unit 112, a query result cache unit 114, a disconnected set holding unit 116, and a graph structure construction unit 118. Multiple functions may be configured to perform on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor).
- the information processing apparatus 100 may include a database 106.
- the maximum amount of calculation in the cutting set search test using the method of this embodiment will be considered.
- the condition variable set size is 1, so the CI test (determining whether the conditional mutual information amount is less than ⁇ ).
- the number of variables handled in is 3 including X and Y. Therefore, the number of CI test patterns in the first stage is r 3 where r is the maximum number of variable states.
- the number of patterns becomes r 4.
- the third stage corresponding to steps 2424 to 2428 in FIG.
- the number of patterns is r N.
- the maximum calculation amount is O (r N + N 2 r 4 ).
- the number of CI test patterns of TPDA is r N , r N ⁇ 1 ,..., R 3 in the first , second,. Therefore, the maximum calculation amount by TPDA is obtained by multiplying the number of CI tests at each stage by the number of patterns and taking the sum. It becomes.
- the difference is a term common to both methods from the respective calculation amounts. It can be analyzed by removing the r N is. That is, excluding the common term, in TPDA In this embodiment, O (N 2 r 4 ).
- the difference O (r N ⁇ 1 ) in the calculation amount is an exponent order for N, whereas in this embodiment, both N and r are polynomial orders. Therefore, according to the present Example, it was confirmed that the maximum calculation amount in the cut set search test can be reduced as compared with the case of TPDA.
- the minimum calculation amount for each CI test is O (r 3 ). This is because according to the present embodiment, if a cut set is found in the first CI test of the first stage, the processing of the second stage and the third stage is unnecessary, and the pattern of r 3 is calculated by the CI test. This is because it only has to be done.
- TPDA when searching for a cut set of variables X and Y from a conditional variable set Z (bold), a subset Z (bold) ' ⁇ Z (bold) of the conditional variable set is descending in the size of the subset.
- the CI test is performed to check whether the target subset is a cut set. Since the calculation amount of the CI test is an exponential order of a given variable set size, the maximum value of the number of states taken by the variable is r, and the given variable set size
- the method according to the present embodiment has a remarkable effect that both the maximum calculation amount and the minimum calculation amount can be reduced as compared with TPDA.
- the present invention has been described as being implemented as the information processing apparatus 100. However, it will be apparent to those skilled in the art that the present invention can be implemented as a program that causes a computer to operate as some or all of the components shown in FIG. It will also be apparent to those skilled in the art that the present invention can be implemented as a program that causes a computer to execute some or all of the steps described in FIG.
- the Bayesian network shown in FIG. 22 was used as the basis of the experiment. Generate 5000, 15000, 30000, and 50000 data for the network, respectively, and use these data as inputs to operate the computer as an information processing device of the present embodiment according to the conventional TPDA algorithm and the algorithm of the present embodiment. By doing this, Bayesian network structure learning was performed.
- Tables 1 to 4 show the experimental results showing the average values when 10 data sets are executed for each of the above data numbers.
- the result of using the conventional TPDA is shown in the “TPDA” line, and the result of this example is shown in the “TS (Three-Staged) -TPDA” line.
- Missing Edge and Extra Edge indicate the number of edges lost in the estimated Bayesian network and the number of extra edges added, respectively, when compared to the correct Bayesian network.
- Bayesian network structure learning can be significantly speeded up and the variation in execution time can be greatly reduced compared to the case where the conventional TPDA algorithm is used. It can also be seen that the error from the correct network indicated by Missing Edge and Extra Edge is suppressed to a level comparable to that of the prior art. As described above, the present embodiment has an excellent effect that the structure learning can be significantly speeded up and the execution time can be stabilized without sacrificing the accuracy of the estimated Bayesian network as compared with the prior art. Play.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
本発明は、ベイジアンネットワーク構造学習のための情報処理装置及びプログラムに関し、より詳細には、多数の確率変数及び大量のデータが存在する状況において安定した計算時間で高速にベイジアンネットワーク構造学習を行うことのできる情報処理装置及びプログラムに関する。
The present invention relates to an information processing apparatus and program for Bayesian network structure learning, and more particularly, to perform Bayesian network structure learning at high speed in a stable calculation time in a situation where there are a large number of random variables and a large amount of data. The present invention relates to an information processing apparatus and a program that can be used.
確率的因果関係の表現であるベイジアンネットワークは、推論精度が高いことで知られる。しかしながら、ベイジアンネットワークの構造学習は、ノード数に対してNP完全であることが知られており、多数の確率変数を大量データから学習するのは極めて困難な問題である。
The Bayesian network, which is an expression of probabilistic causality, is known for its high inference accuracy. However, Bayesian network structure learning is known to be NP-complete with respect to the number of nodes, and learning a large number of random variables from a large amount of data is a very difficult problem.
ベイジアンネットワーク構造学習アルゴリズムは、真の同時確率分布とベイジアンネットワークで表現する同時確率分布の近似を最大化することを目的としたスコアリングベース手法と、確率変数間の独立性判定により因果関係を抽出することを目的とした制約ベース(CI(Conditional Independence)ベースとも呼ばれる)の手法とに大別される。制約ベース学習アルゴリズムは、所与のデータを用いて確率変数間の条件付き独立テストを行い、その結果により確率変数間にエッジ(アークともいう。)が必要か否かを決定するアルゴリズムの総称である。大規模な確率変数を扱う場合には、変数間の独立性判定を行う手法を用いた方が推論精度が高くなることが知られている。
Bayesian network structure learning algorithm extracts causal relations by scoring-based method aiming at maximizing approximation of true joint probability distribution and joint probability distribution expressed by Bayesian network and independence judgment between random variables It is roughly classified into a constraint-based method (also called CI (Conditional Independence) -based) for the purpose of doing this. A constraint-based learning algorithm is a generic name for algorithms that perform conditional independent tests between random variables using given data and determine whether edges (also called arcs) are required between random variables based on the results. is there. When dealing with large-scale random variables, it is known that the inference accuracy is higher when a method for determining independence between variables is used.
制約ベース学習アルゴリズムの1つとして、TPDA(Three Phase Dependency Analysis、3フェーズ独立性分析)と呼ばれるアルゴリズムが提案されている(非特許文献1)。TPDAは、真の確率構造にMDF(monotone DAG faithful)と呼ぶ条件を仮定することにより、条件付き独立テストの数をO(N4)に抑えるアルゴリズムである。ここで、DAG faithfulであるとは、真の因果構造が非循環有向グラフであることをいう。ベイジアンネットワークのモデルは、N=<V,E,P>あるいはN=<G,P>と表現される。ここでG=(V,E)は頂点集合V、辺集合Eからなる非循環有向グラフで、Pは、pa(xn)をn番目の確率変数xnの親変数とすると、P={P(x1|pa(x1)),…,P(xn|pa(xn))}で表される条件付き確率分布集合である。集合Pはこれに対応する同時確率分布P(U)を
として与えている。OpenN(X,Y|C)(Cは太字)を、モデルNを所与としたとき頂点X、Yのパス上にあって、確率変数集合C(太字)を条件として所与としたときにX、Yのパスを開くとき、すなわちXとYが互いに依存するときに、その開かれたパス上の確率変数集合を指すものとする。またC(太字)を所与としたX、Yの条件付き相互情報量をI(X,Y|C(太字))とすると、monotone DAG faithfulであるとは、次のように定義される。すなわち、DAG faithfulなモデルN=<V,E,P>は、Vの要素であるすべてのX、Yについて、
であるとき、またそのときに限り、monotone DAG faithfulである。
As one of the constraint-based learning algorithms, an algorithm called TPDA (Three Phase Dependency Analysis) has been proposed (Non-Patent Document 1). TPDA is an algorithm that suppresses the number of conditional independent tests to O (N 4 ) by assuming a condition called MDF (monotone DAG faithful) in a true probability structure. Here, DAG faithful means that the true causal structure is an acyclic directed graph. The model of the Bayesian network is expressed as N = <V, E, P> or N = <G, P>. Here, G = (V, E) is an acyclic directed graph composed of a vertex set V and an edge set E, and P is P = {P, where pa (x n ) is a parent variable of the nth random variable x n. (X 1 | pa (x 1 )),..., P (x n | pa (x n ))}. Set P has a corresponding joint probability distribution P (U)
As given. When Open N (X, Y | C) (C is bold) is on the path of vertices X and Y when model N is given, and given random variable set C (bold) as a condition When the X and Y paths are opened, that is, when X and Y depend on each other, the random variable set on the opened path is indicated. Moreover, if the conditional mutual information of X and Y given C (bold) is I (X, Y | C (bold)), monotone DAG faithful is defined as follows. That is, the DAG faithful model N = <V, E, P> is for all X and Y elements of V.
And only then is monotone DAG faithful.
TPDAにおいては、木構造準備(Drafting)フェーズ、エッジ増加(Thickening)フェーズ及びエッジ削減(Thinning)フェーズと呼ばれる3つのフェーズを実行し、最後にエッジの方向付けを行うことにより構造学習を行う。
In TPDA, three phases called a tree structure preparation (Drafting) phase, an edge increase (Thickening) phase, and an edge reduction (Thinning) phase are executed, and the structure is learned by finally directing edges.
TPDAアルゴリズムによるベイジアンネットワーク構造学習に際しては、まず、主たる入力となりCSV形式やリレーショナルデータベースのリレーションなどで表現された表形式のデータと、当該データにどのような確率変数及びどのような状態(実現値)が含まれるかを記述した情報(以下、「データ仕様記述情報」という。)とが入力される。
When learning the Bayesian network structure using the TPDA algorithm, first, the main input is tabular data expressed in CSV format or relational database relations, and what random variables and what states (realized values) are included in the data. Is described (hereinafter referred to as “data specification description information”).
木構造準備フェーズにおいては、入力データに含まれるすべての確率変数のペアについて、次の式(2)で表される相互情報量を計算する。
In the tree structure preparation phase, the mutual information represented by the following equation (2) is calculated for all pairs of random variables included in the input data.
ここで、大文字X、Yは確率変数を表し、小文字x、yはそれぞれに対応する状態値を表す。木構造準備フェーズにおいては、さらに、計算された相互情報量の値が大きいペアから順に、グラフ全体の構造が木構造である限り(グラフが閉路を構成しない限り)、当該確率変数のペアの間に無向エッジを追加する。
Here, capital letters X and Y represent random variables, and small letters x and y represent corresponding state values. In the tree structure preparation phase, in addition from the pair with the larger mutual information value calculated, as long as the structure of the entire graph is a tree structure (unless the graph forms a cycle), the pair of random variables Add an undirected edge to
次に、エッジ増加フェーズにおいては、木構造準備フェーズを終了した段階で無向エッジが存在しない確率変数のペアの各々につき、当該確率変数のペアを始点及び終点とするパス上でそれら2つの確率変数のノードに隣接するノードを最初の条件集合として条件付き相互情報量を計算する。条件付き相互情報量は次の式(3)により計算される。
Next, in the edge increase phase, for each pair of random variables for which there is no undirected edge at the stage of completing the tree structure preparation phase, the two probabilities on the path starting from the random variable pair as the start point and the end point The conditional mutual information is calculated with the node adjacent to the variable node as the first set of conditions. The conditional mutual information is calculated by the following equation (3).
ここで、太字大文字Cは確率変数集合を表し、太字小文字cはそれぞれの変数に対応する状態値集合を表す。計算された条件付き相互情報量が所定の閾値ε以上である場合、確率変数集合C(太字)を小さくして式(3)の計算を繰り返す。条件付き相互情報量がε未満となる条件集合が見つかった場合には、当該確率変数ペアの間に無向エッジを追加せず、グローバルな切断集合にレコードを登録する。一方、条件集合が空集合となるまでの間に条件付き相互情報量がε未満となる条件集合が見つからなかった場合には、確率変数ペアの間に無向エッジを追加する。
Here, the capital letter C represents a random variable set, and the capital letter c represents a state value set corresponding to each variable. If the calculated conditional mutual information amount is equal to or greater than the predetermined threshold ε, the random variable set C (bold) is reduced and the calculation of Expression (3) is repeated. When a condition set in which the conditional mutual information is less than ε is found, a record is registered in the global cut set without adding an undirected edge between the random variable pairs. On the other hand, if a condition set whose conditional mutual information is less than ε is not found before the condition set becomes an empty set, an undirected edge is added between the random variable pairs.
続いて、エッジ削減フェーズにおいては、エッジ増加フェーズを終了した段階で無向エッジが存在する確率変数ペアの各々について、当該無向エッジ以外のパスがあれば当該無向エッジを一時的に削除し、エッジ増加フェーズのときと同様に、条件集合を徐々に小さくしながら閾値ε未満となる条件集合が見つかるまで条件付き相互情報量を計算する。閾値ε未満となる条件集合が見つかれば、当該無向エッジを削除したままグローバルな切断集合にレコードを登録する。閾値ε未満となる条件集合が見つからなければ、一時的に削除した無向エッジを追加して元に戻す。次に、現時点で無向エッジのある各確率変数ペアに着目し、当該確率変数ペアのいずれかがペアの相手を除いて3つ以上の隣接ノードを有する場合、より精密なエッジ要否判定を行う。具体的には、当該確率変数ペアのパス上で、各確率変数について、当該確率変数の隣接ノード及び当該隣接ノードにさらに隣接したノードを要素とする集合のサイズを比較し、小さい方を最初の条件集合とする。その上で、これまでと同様に、条件集合を徐々に小さくしながら閾値ε未満となる条件集合が見つかるまで条件付き相互情報量を計算する。閾値ε未満となる条件集合が見つかれば、当該無向エッジを削除したままグローバルな切断集合にレコードを登録し、見つからなければ、一時的に削除したエッジを追加して元に戻す。
Subsequently, in the edge reduction phase, if there is a path other than the undirected edge for each random variable pair in which the undirected edge exists at the stage when the edge increase phase is completed, the undirected edge is temporarily deleted. As in the edge increase phase, the conditional mutual information is calculated until a condition set that is less than the threshold ε is found while gradually reducing the condition set. If a condition set that is less than the threshold ε is found, the record is registered in the global cut set while the undirected edge is deleted. If the condition set that is less than the threshold ε is not found, the temporarily deleted undirected edge is added and restored. Next, paying attention to each random variable pair with an undirected edge at the present time, if any of the random variable pairs has three or more adjacent nodes except for the counterpart of the pair, a more precise edge necessity determination is performed. Do. Specifically, on the path of the random variable pair, for each random variable, compare the size of the set having the adjacent node of the random variable and a node further adjacent to the adjacent node as an element, and the smaller one is the first A condition set. Then, as in the past, the conditional mutual information is calculated until a condition set that is less than the threshold ε is found while gradually reducing the condition set. If a condition set that is less than the threshold ε is found, a record is registered in the global cut set while the undirected edge is deleted, and if not found, the deleted edge is temporarily added and restored.
上述の木構造準備、エッジ増加及びエッジ削減の3つのフェーズを経た後、これらのフェーズにより追加された無向エッジの向きを決定する作業を行う。具体的には、XとY及びZとYが無向エッジによりそれぞれ直接接続されているがXとYとは直接接続されていないような3つのノードX、Y、Zのすべての組について、グローバルな切断集合にこれらのノードが含まれるか否かなどに基づいて、向きを決定可能なエッジについては向きを決定する。TPDAにおいては、以上のようにしてベイジアンネットワーク構造が学習される。
After the above three phases of tree structure preparation, edge increase and edge reduction, work to determine the direction of the undirected edge added by these phases is performed. Specifically, for all sets of three nodes X, Y, and Z where X and Y and Z and Y are directly connected by undirected edges but X and Y are not directly connected, The direction is determined for an edge whose direction can be determined based on whether or not these nodes are included in the global cutting set. In TPDA, the Bayesian network structure is learned as described above.
以上説明したTPDAは、現在知られている制約ベース学習アルゴリズムのうち、もっとも高速に動作するものであり、大規模な変数や大量のデータの処理に適したアルゴリズムである。しかしながら、それでもなお、確率変数の数がさらに多くなるような例においては、条件付き独立テストの計算が困難になるという問題がある。すなわち、式(3)に示す条件付き相互情報量の右辺における同時確率分布P(x,y,c(太字))部分のc(太字)にあたる変数の数が増加するため、計算量が増大し、計算が困難になる。また、同時確率分布のパターンが増えるに従い、計算結果に寄与しない欠損値が多く発生してしまうという問題がある。
The TPDA described above is the fastest-running currently known constraint-based learning algorithm, and is an algorithm suitable for processing large-scale variables and large amounts of data. However, in an example where the number of random variables is still larger, there is a problem that it becomes difficult to calculate a conditional independent test. That is, since the number of variables corresponding to c (bold) in the joint probability distribution P (x, y, c (bold)) portion on the right side of the conditional mutual information shown in Expression (3) increases, the amount of calculation increases. It becomes difficult to calculate. In addition, as the pattern of the joint probability distribution increases, there is a problem that more missing values are generated that do not contribute to the calculation result.
本願発明者らは、制約ベース学習アルゴリズムに多頻度アイテム集合抽出アルゴリズムを組み込むことにより、上述した従来のTPDAと比較してさらに高速且つ処理時間のばらつきの少ないベイジアンネットワーク構造学習が可能となることを発見した。
By incorporating a frequent item set extraction algorithm into the constraint-based learning algorithm, the inventors of the present application can perform Bayesian network structure learning that is faster and has less processing time variation than the conventional TPDA described above. discovered.
また、本願発明者らは、切断集合の探索を行う際に、条件集合の部分集合サイズが昇順となる順に条件付き相互情報量の計算を行いかつMDFの仮定を最大限早期に用いること、すなわち2つの条件変数を所与とした場合と当該条件変数集合中の単一の各条件変数を所与とした場合の条件付き相互情報量とを比較し、切断集合に含まれない変数を2段階目ですべて探索対象から削除することにより、最大でもわずか3つの段階のみによって切断集合の探索が可能となることを発見した。これにより、従来のTPDAと比較してさらに高速なベイジアンネットワーク構造学習が可能となる。
In addition, when searching for a cut set, the inventors of the present application perform conditional mutual information calculation in the order in which the subset size of the condition set is in ascending order, and use the MDF assumption as early as possible. Comparing the mutual information with conditions when given two conditional variables and given each conditional variable in the conditional variable set, two levels of variables not included in the cut set It was found that the cut set can be searched by only three steps at maximum by deleting all from the search target with the eyes. As a result, Bayesian network structure learning can be performed at higher speed than conventional TPDA.
本発明は、これらの新たなアルゴリズムに基づいてベイジアンネットワーク構造学習を実行する情報処理装置及びプログラムを提供するものである。
The present invention provides an information processing apparatus and program for performing Bayesian network structure learning based on these new algorithms.
本発明の1つの側面によれば、本発明の情報処理装置は、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置である。当該情報処理装置は、入力データについて木構造のグラフを生成する手段であって、相互情報量が第1の閾値以上である確率変数ペアの各々について、該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加する、木構造のグラフを生成する手段を含む。情報処理装置はさらに、相互情報量が第1の閾値以上でありながら上記木構造のグラフを生成する手段によってエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にエッジを追加する手段を含む。当該手段は、該エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値が第1の閾値未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加する。また、当該手段は、上記条件付き相互情報量の計算において、該2つの確率変数の状態とそれぞれの確率変数に対応する状態集合とについての同時確率分布が第1の閾値以下の第2の閾値未満となる場合には、関連する成分の計算を省略する。
According to one aspect of the present invention, the information processing apparatus of the present invention is an information processing apparatus that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes. The information processing apparatus is a unit that generates a tree-structured graph for input data, and adds an edge between the random variable pairs for each random variable pair whose mutual information amount is equal to or greater than a first threshold value. Includes means for generating a tree-structured graph that adds an edge when the graph structure remains a tree structure. The information processing apparatus further adds an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold but the edge is not added by the means for generating the tree structure graph. Means to do. The means includes a random variable included in a set of nodes on a path between the two random variable nodes constituting the random variable pair to which the edge is not added and adjacent to one of the two random variable nodes. If a pair whose condition value is less than the first threshold is found using the set of as a condition set, an edge is not added between the two random variables, otherwise Add an edge. In addition, in the calculation of the conditional mutual information, the means includes a second threshold value in which a joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is equal to or less than a first threshold value. If it is less, the calculation of the related component is omitted.
上記木構造のグラフを生成する手段は、入力データに含まれる各確率変数の周辺確率を計算し、確率変数ペアを構成するいずれかの確率変数がある状態を取る周辺確率が第2の閾値未満である場合には、関連する相互情報量の成分の計算を省略することにより、各確率変数ペアの相互情報量を計算してもよい。また、上記エッジを追加する手段は、エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合Cについての条件付き相互情報量を、条件集合Cのサイズが1である場合から最終的な条件集合のサイズとなる場合までサイズを大きくしながら計算し、その過程で第1の閾値未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加してもよい。情報処理装置は、さらに、上記エッジを追加する手段による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、不要である場合にエッジを削除する手段と、各エッジの方向付けを行う手段とを含んでもよい。
The means for generating the tree-structured graph calculates the peripheral probability of each random variable included in the input data, and the peripheral probability taking a state with any random variable constituting the random variable pair is less than the second threshold In this case, the mutual information amount of each random variable pair may be calculated by omitting the calculation of the component of the related mutual information amount. Further, the means for adding an edge includes a set of nodes adjacent to one of the two random variable nodes on a path between two random variable nodes constituting a random variable pair to which no edge is added. Is the final condition set, and the conditional mutual information about the two random variable nodes and the condition set C is the size from when the size of the condition set C is 1 to the size of the final condition set. When a conditional mutual information amount that is less than the first threshold value is obtained in the process, an edge is not added between the two random variables, otherwise it is determined to be necessary Then, an edge may be added. The information processing apparatus further determines, for each random variable pair having an edge after processing by the means for adding an edge, whether or not an edge is necessary, and deletes the edge if unnecessary, Means for directing the edge.
本発明の別の側面によれば、本発明の情報処理装置は、制約ベース学習アルゴリズムを使用して、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置である。当該情報処理装置は、ある確率変数ペア間にエッジを追加すべきか否かを条件付き相互情報量を求めることにより判断し、該判断に際して、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値が第1の閾値未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加する。また、情報処理装置は、条件付き相互情報量の計算において、該2つの確率変数の状態とそれぞれの確率変数に対応する状態集合についての同時確率分布が第1の閾値以下の第2の閾値未満となる場合には、関連する成分の計算を省略する。上記情報処理装置は、確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合Cについての条件付き相互情報量を、条件集合Cのサイズが1である場合から最終的な条件集合のサイズとなる場合までサイズを大きくしながら計算し、その過程で第1の閾値未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加するように構成されてもよい。
According to another aspect of the present invention, the information processing apparatus of the present invention uses a constraint-based learning algorithm to perform Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. It is the information processing apparatus which performs. The information processing apparatus determines whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount, and at the time of the determination, for the two random variable nodes constituting the random variable pair, A conditional mutual information amount is calculated using a set of random variables included in a set of nodes adjacent to one of the two random variable nodes on the path between the two as a condition set, and the value is less than the first threshold value. If a set is found, no edge is added between the two random variables, otherwise an edge is added. In addition, in the calculation of the conditional mutual information amount, the information processing apparatus has a simultaneous probability distribution for the states of the two random variables and the state set corresponding to each random variable less than a second threshold that is equal to or less than a first threshold. In this case, calculation of related components is omitted. For the two random variable nodes constituting the random variable pair, the information processing apparatus uses, as a final condition set, a set of nodes on the path between them and adjacent to either of the two random variable nodes. The conditional mutual information about the two random variable nodes and the condition set C is calculated while increasing the size from the case where the size of the condition set C is 1 to the size of the final condition set. When a conditional mutual information amount that is less than the first threshold is obtained, an edge is not added between the two random variables, and an edge is added if it is determined that it is not necessary. May be.
本発明の別の側面によれば、本発明は、コンピュータを上述の情報処理装置のように動作させるプログラムである。
According to another aspect of the present invention, the present invention is a program that causes a computer to operate like the information processing apparatus described above.
本発明の別の側面によれば、本発明の情報処理装置は、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置である。当該情報処理装置は、入力データについて木構造のグラフを生成する手段であって、相互情報量が第1の閾値以上である確率変数ペアの各々について、該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加する、木構造のグラフを生成する手段を含む。
According to another aspect of the present invention, the information processing apparatus of the present invention is an information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. The information processing apparatus is a unit that generates a tree-structured graph for input data, and adds an edge between the random variable pairs for each random variable pair whose mutual information amount is equal to or greater than a first threshold value. Includes means for generating a tree-structured graph that adds an edge when the graph structure remains a tree structure.
情報処理装置はさらに、相互情報量が第1の閾値以上でありながら上記木構造のグラフを生成する手段によってエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にエッジを追加する手段を含む。当該エッジを追加する手段は、該エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数を含む条件集合を候補条件集合とし、当該候補条件集合内の各1つの確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに該確率変数ペアについての処理を終了する。
The information processing apparatus further adds an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold but the edge is not added by the means for generating the tree structure graph. Means to do. The means for adding the edge is a set of nodes adjacent to one of the two random variable nodes on a path between two random variable nodes constituting the random variable pair to which the edge is not added. A conditional set including the included random variables is used as a candidate condition set, each one of the random variables in the candidate condition set is used as a conditional set, and the conditional mutual information is calculated. At least one of the calculated conditional mutual information Is less than the first threshold, the processing for the random variable pair is terminated without adding an edge between the two random variables.
また、当該エッジを追加する手段は、上記処理が終了しない場合、候補条件集合内のいずれか2つの確率変数の組を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了する。計算された条件付き相互情報量が一方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、候補条件集合から該一方の確率変数が削除され、計算された条件付き相互情報量が他方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、候補条件集合から該他方の確率変数が削除される。
Further, the means for adding the edge calculates the conditional mutual information using any two random variable pairs in the candidate condition set as a condition set when the above processing is not completed, and the calculated conditional mutual information If at least one of the quantities is less than the first threshold, the process ends without adding an edge between the two random variables. When the calculated conditional mutual information is larger than the conditional mutual information that has already been calculated using only one random variable as a condition set, the one random variable is deleted from the candidate condition set and calculated If the conditional mutual information amount is larger than the conditional mutual information amount that has already been calculated using only the other random variable as a condition set, the other random variable is deleted from the candidate condition set.
また、当該エッジを追加する手段は、上記処理が終了しない場合、候補条件集合に残るすべての確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了する。
Further, the means for adding the edge calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set as a condition set when the above processing is not completed, and at least one of the calculated conditional mutual information amounts. If is less than the first threshold, the process ends without adding an edge between the two random variables.
情報処理装置は、さらに、上記処理が終了しない場合、該2つの確率変数間にエッジを追加する、エッジを追加する手段を含む。
The information processing apparatus further includes means for adding an edge that adds an edge between the two random variables when the above processing is not completed.
情報処理装置は、さらに、上記の処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、不要である場合にエッジを削除する手段と、各エッジの方向付けを行う手段とを含んでもよい。
The information processing apparatus further determines, for each random variable pair having an edge after the above processing, whether or not an edge is necessary, and if it is not necessary, deletes the edge and directs each edge. Means for performing.
本発明の別の側面によれば、本発明の情報処理装置は、制約ベース学習アルゴリズムを使用して、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置である。当該情報処理装置は、ある確率変数ペア間にエッジを追加すべきか否かを条件付き相互情報量を求めることにより判断する。
According to another aspect of the present invention, the information processing apparatus of the present invention uses a constraint-based learning algorithm to perform Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes. It is the information processing apparatus which performs. The information processing apparatus determines whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount.
情報処理装置は、該判断に際して、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数を含む条件集合を候補条件集合とし、候補条件集合内の各1つの確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに該確率変数ペアについての処理を終了する。
In the determination, the information processing apparatus, for the two random variable nodes constituting the random variable pair, selects a random variable included in a set of nodes on a path between the adjacent random variable nodes and adjacent to either of the two random variable nodes. Conditional mutual information is calculated by using a condition set including the candidate condition set and each one of the random variables in the candidate condition set as a condition set, and at least one of the calculated conditional mutual information is less than the first threshold value In such a case, the processing for the random variable pair is terminated without adding an edge between the two random variables.
また、情報処理装置は、上記処理が終了しない場合、候補条件集合内のいずれか2つの確率変数の組を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了する。計算された条件付き相互情報量が一方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、候補条件集合から該一方の確率変数が削除され、計算された条件付き相互情報量が他方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、候補条件集合から該他方の確率変数が削除される。
In addition, when the above processing is not completed, the information processing apparatus calculates a conditional mutual information amount using any two sets of random variables in the candidate condition set as a condition set, and at least of the calculated conditional mutual information amount If either is less than the first threshold, the process is terminated without adding an edge between the two random variables. When the calculated conditional mutual information is larger than the conditional mutual information that has already been calculated using only one random variable as a condition set, the one random variable is deleted from the candidate condition set and calculated If the conditional mutual information amount is larger than the conditional mutual information amount that has already been calculated using only the other random variable as a condition set, the other random variable is deleted from the candidate condition set.
また、情報処理装置は、上記処理が終了しない場合、候補条件集合に残るすべての確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了する。
In addition, when the above processing does not end, the information processing apparatus calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set as a condition set, and at least one of the calculated conditional mutual information amounts is the first. If the threshold value is less than 1, the process ends without adding an edge between the two random variables.
情報処理装置は、さらに、上記処理が終了しない場合、該2つの確率変数間にエッジを追加する手段を含む。
The information processing apparatus further includes means for adding an edge between the two random variables when the above processing does not end.
本発明の別の側面によれば、本発明は、コンピュータを上述の情報処理装置のように動作させるプログラムである。
According to another aspect of the present invention, the present invention is a program that causes a computer to operate like the information processing apparatus described above.
本発明によれば、従来知られていた制約ベース学習アルゴリズムのうちでもっとも高速に動作するTPDAアルゴリズムを用いても困難であった多数の確率変数及び大量のデータが存在する状況下でも、安定した計算時間で高速にベイジアンネットワーク構造学習を行うことができる。したがって、本発明によれば、ベイジアンネットワークの産業応用範囲を拡大することができる。
According to the present invention, even in a situation where there are a large number of random variables and a large amount of data that were difficult even when using the TPDA algorithm that operates at the highest speed among the conventionally known constraint-based learning algorithms, Bayesian network structure learning can be performed at high speed in calculation time. Therefore, according to this invention, the industrial application range of a Bayesian network can be expanded.
本発明の実施例によるベイジアンネットワーク構造学習を実行するための情報処理装置100のブロック図を図1に示す。
FIG. 1 shows a block diagram of an information processing apparatus 100 for executing Bayesian network structure learning according to an embodiment of the present invention.
制御部102は、情報処理装置100全体の処理の流れを制御する部分である。制御部102は、前処理として、構造学習の開始に際して指定される引数やパラメータが正常であるか否かをチェックする。制御部102は、これらが正常である場合にデータ仕様解析部104にデータ仕様を解析させる。制御部102は、その後、引き続き構造学習部110にアルゴリズムの主処理を実行させる。データ仕様解析部104は、データ仕様記述ファイル108を読み込み、主たる入力となるデータを分析する準備を行う。ここで、主たる入力となるデータは、例えば、CSV形式、リレーショナルデータベースにおけるリレーションなどで表現されている表形式のデータである。当該データは、例えば、ユーザにより情報処理装置100に入力され、情報処理装置100内のデータベース106に格納される。このほか、当該データは、情報処理装置100と有線又は無線により通信可能に接続された通信ネットワーク上のデータベースに格納されていてもよく、ベイジアンネットワーク構造学習を要求するコマンドの受信に応じて情報処理装置100が当該データベースにアクセスしてデータを受け取ってもよい。データを格納するデータストアは、ファイル、リレーショナルデータベース、メモリ上の2次元配列のいずれであってもよい。本実施例では、リレーショナルデータベースであるとして以下の説明を行う。
The control unit 102 is a part that controls the flow of processing of the information processing apparatus 100 as a whole. As preprocessing, the control unit 102 checks whether arguments and parameters specified at the start of structure learning are normal. When these are normal, the control unit 102 causes the data specification analysis unit 104 to analyze the data specification. Thereafter, the control unit 102 causes the structure learning unit 110 to execute the main processing of the algorithm. The data specification analysis unit 104 reads the data specification description file 108 and prepares to analyze data that is the main input. Here, the main input data is, for example, data in a tabular format expressed in a CSV format, a relation in a relational database, or the like. The data is input to the information processing apparatus 100 by a user and stored in the database 106 in the information processing apparatus 100, for example. In addition, the data may be stored in a database on a communication network that is connected to the information processing apparatus 100 in a wired or wireless manner, and information processing is performed in response to reception of a command that requests Bayesian network structure learning. The device 100 may access the database and receive data. The data store for storing data may be a file, a relational database, or a two-dimensional array on a memory. In the present embodiment, the following description will be given assuming that it is a relational database.
このようなデータに含まれる情報の例を図2に示す。各列は,「ID」及び「各確率変数名」で構成され、各行は、対応する「ID」及び「各確率変数の状態(実現値)」で構成されている。図2の例は、A、B、C及びDの4つの商品を取り扱っている店舗において、顧客が使用するクーポン券の種類をT1及びT2の2種類(これらは併用不可とする。)とし、クーポン券が使用されなかった場合をnで表し、顧客が購入した商品をy、購入しなかった商品をnと表すこととした場合における、顧客6人分の購買データを表している。
An example of information contained in such data is shown in FIG. Each column is composed of “ID” and “each random variable name”, and each row is composed of a corresponding “ID” and “state (realized value) of each random variable”. In the example of FIG. 2, in the store handling four products A, B, C, and D, the types of coupons used by the customer are two types T1 and T2 (these cannot be used together). The case where the coupon is not used is represented by n, the product purchased by the customer is represented by y, and the product which has not been purchased by n is represented by purchase data for six customers.
データ仕様記述ファイル108は、上記データに含まれる確率変数及び各確率変数の状態(実現値)についての情報を含むファイルである。データ仕様記述ファイル108は、例えばCSV形式のファイルであり、確率変数の状態数がn個である場合、各行に、確率変数名、状態1、状態2、・・・、状態nというように記述される。例えば、先の図2の例の場合、顧客の購買行動履歴データを表す確率変数及びその実現値は、データ仕様記述ファイル108に次のように記述される。
クーポン, T1, T2, n
A, y, n
B, y, n
C, y, n
D, y, n The data specification description file 108 is a file including information on the random variables included in the data and the state (realized value) of each random variable. The data specification description file 108 is, for example, a CSV format file. When the number of states of the random variable is n, description is made in each row as a random variable name, state 1, state 2,..., State n. Is done. For example, in the case of the example of FIG. 2 described above, the random variable representing the purchase behavior history data of the customer and the actual value thereof are described in the data specification description file 108 as follows.
Coupon, T1, T2, n
A, y, n
B, y, n
C, y, n
D, y, n
クーポン, T1, T2, n
A, y, n
B, y, n
C, y, n
D, y, n The data specification description file 108 is a file including information on the random variables included in the data and the state (realized value) of each random variable. The data specification description file 108 is, for example, a CSV format file. When the number of states of the random variable is n, description is made in each row as a random variable name, state 1, state 2,..., State n. Is done. For example, in the case of the example of FIG. 2 described above, the random variable representing the purchase behavior history data of the customer and the actual value thereof are described in the data specification description file 108 as follows.
Coupon, T1, T2, n
A, y, n
B, y, n
C, y, n
D, y, n
再び図1に戻り、データ仕様解析部104は、データ仕様記述ファイル108を読み込み、各確率変数の名前、確率変数の数、各確率変数の状態名、各確率変数の状態の数、全体のデータ件数についての情報を保持し、他の構成要素にこれらの情報を提供する。
Returning to FIG. 1 again, the data specification analysis unit 104 reads the data specification description file 108, names each random variable, the number of random variables, the state name of each random variable, the number of states of each random variable, and the entire data. It holds information about the number of cases and provides this information to other components.
構造学習部110は、本出願において提案されるベイジアンネットワーク構造学習アルゴリズムを実行する部分であり、その具体的な動作については以下に詳細に説明する。
The structure learning unit 110 is a part that executes the Bayesian network structure learning algorithm proposed in the present application, and a specific operation thereof will be described in detail below.
クエリー管理部112は、入力されたデータから相互情報量および条件付き相互情報量を計算する。これらの計算に必要な確率分布の算出のためには、条件に該当するデータの件数を数えるデータベースクエリーをデータベース106に対して発行する必要がある。以下に詳細に述べるが、本発明は、一実施例において、相互情報量及び条件付き相互情報量の計算の際に、多頻度アイテム集合抽出アルゴリズムを用いて必要性が低い計算を行わないようにして、全体の処理を高速化する点を特徴とする。また、本アルゴリズム実行中は、同一条件のデータ件数が複数回参照されるが,クエリー結果の取得は比較的時間のかかる処理であるため、そのたびにクエリーを発行していては効率が悪い。そこで、クエリー管理部112は、一度取得したクエリー結果と対応する条件をクエリー結果キャッシュ部114に渡して保持させる。クエリー管理部112は、データ件数を取得する必要が生じたときには、クエリー結果キャッシュ部114に問合せ,すでに結果を取得済みであればその結果を利用し、まだ結果を得ていなければクエリーを発行してデータベース106からデータ件数を取得する。
The query management unit 112 calculates a mutual information amount and a conditional mutual information amount from the input data. In order to calculate the probability distribution necessary for these calculations, it is necessary to issue to the database 106 a database query that counts the number of data corresponding to the condition. As will be described in detail below, the present invention, in one embodiment, avoids performing calculations that are less necessary using a frequent item set extraction algorithm when calculating mutual information and conditional mutual information. Thus, the whole process is speeded up. Further, while the algorithm is being executed, the number of data with the same condition is referred to a plurality of times. However, since obtaining the query result is a relatively time-consuming process, it is inefficient to issue a query each time. Therefore, the query management unit 112 passes the condition corresponding to the query result acquired once to the query result cache unit 114 to hold it. The query management unit 112 queries the query result cache unit 114 when it is necessary to acquire the number of data items, uses the result if the result has already been acquired, and issues the query if the result has not been obtained yet. The number of data items is acquired from the database 106.
クエリー結果キャッシュ部114は、クエリーの検索条件をキーとし、クエリー結果たる該当データ件数を値としたハッシュテーブルを内部データ構造として持ち、クエリー結果を保持する。クエリー結果キャッシュ部114は、検索条件に該当するクエリー結果をすでに保持しているか否かについてのクエリー管理部112からの問合せに応答する機能と、新規のキー及び値のペアを保持する機能とを持つ。
The query result cache unit 114 has, as an internal data structure, a hash table in which the query search condition is a key and the number of corresponding data items as a query result is a value, and holds the query result. The query result cache unit 114 has a function of responding to a query from the query management unit 112 as to whether or not a query result corresponding to the search condition is already held, and a function of holding a new key / value pair. Have.
切断集合保持部116は、確率変数ペア間の条件付き相互情報量を計算した際に、当該確率変数ペアと、条件付き相互情報量が閾値ε未満となる条件部分の変数集合とをレコードとし、要素として有するグローバルな切断集合を保持する機能を具備する。当該切断集合は、無向エッジの方向付けの際に必要とされる。
When the cutting set holding unit 116 calculates the conditional mutual information amount between the random variable pairs, the probability variable pair and the variable set of the conditional part in which the conditional mutual information amount is less than the threshold ε are records, It has a function of holding a global cutting set as an element. The cut set is required for directing undirected edges.
グラフ構造構築部118は、構造学習部110において推定されたベイジアンネットワークのグラフ構造を構築する機能を持つ部分である。他の構成要素と共有するデータ構造として、1)確率変数を表すノードの配列、及び2)確率変数ペア間の依存関係を表す有向または無向エッジの配列の構築と管理を行う。
The graph structure construction unit 118 is a part having a function of constructing the graph structure of the Bayesian network estimated by the structure learning unit 110. As a data structure shared with other constituent elements, 1) an array of nodes representing random variables, and 2) an array of directed or undirected edges representing dependencies between random variable pairs are constructed and managed.
図1の各構成要素による動作の結果、情報処理装置100は、ベイジアンネットワーク構造記述ファイル120を出力する。ベイジアンネットワーク構造記述ファイル120は、情報処理装置100により推定されたベイジアンネットワークの構造の情報を有するファイルである。例えば、推定されたエッジが向きを検出されて有向エッジとなった場合、「親変数名→子変数名」のように表され、エッジの向きを検出することができずに無向エッジとなった場合には、「変数名1-変数名2」のように表される。例えば、図2の例において、情報処理装置100による構造学習の結果、クーポンがA及びDの親変数であり、AがBの親変数であり、B及びCはどちらが親かは不明であるがこれらの間にエッジが存在することが推定された場合、出力されるベイジアンネットワーク構造記述ファイル120は次のような内容を含む。
クーポン→A
クーポン→D
A→B
B-C As a result of the operation by each component in FIG. 1, the information processing apparatus 100 outputs a Bayesian network structure description file 120. The Bayesian network structure description file 120 is a file having information on the structure of the Bayesian network estimated by the information processing apparatus 100. For example, when an estimated edge is detected and becomes a directed edge, it is expressed as “parent variable name → child variable name”, and the direction of the edge cannot be detected and the undirected edge is detected. In such a case, it is expressed as “variable name 1−variable name 2”. For example, in the example of FIG. 2, as a result of the structure learning by the information processing apparatus 100, the coupon is a parent variable of A and D, A is a parent variable of B, and it is unknown which of B and C is the parent. When it is estimated that an edge exists between them, the output Bayesian network structure description file 120 includes the following contents.
Coupon → A
Coupon → D
A → B
BC
クーポン→A
クーポン→D
A→B
B-C As a result of the operation by each component in FIG. 1, the information processing apparatus 100 outputs a Bayesian network structure description file 120. The Bayesian network structure description file 120 is a file having information on the structure of the Bayesian network estimated by the information processing apparatus 100. For example, when an estimated edge is detected and becomes a directed edge, it is expressed as “parent variable name → child variable name”, and the direction of the edge cannot be detected and the undirected edge is detected. In such a case, it is expressed as “variable name 1−variable name 2”. For example, in the example of FIG. 2, as a result of the structure learning by the information processing apparatus 100, the coupon is a parent variable of A and D, A is a parent variable of B, and it is unknown which of B and C is the parent. When it is estimated that an edge exists between them, the output Bayesian network structure description file 120 includes the following contents.
Coupon → A
Coupon → D
A → B
BC
制約ベース学習アルゴリズムに多頻度アイテム集合抽出アルゴリズムを組み込むことにより、従来のTPDAと比較してさらに高速且つ処理時間のばらつきの少ないベイジアンネットワーク構造学習を可能とする本発明の一実施例につき、以下に詳細に説明する。
By incorporating a frequent item set extraction algorithm into a constraint-based learning algorithm, Bayesian network structure learning can be performed at a higher speed and with less processing time variation than conventional TPDA. This will be described in detail.
本実施例の情報処理装置100による処理を示す流れ図を図3に示す。ベイジアンネットワーク構造学習を実行すべき旨の命令を受信すると、情報処理装置100は処理を開始する(ステップ302)。当該命令は、構造学習の基礎となるデータを格納するデータベース106にアクセスするための接続情報及びデータ仕様記述ファイル名を含む所定の動作パラメータを含むように構成される。当該動作パラメータは、上記のほか、構造学習において使用される相互情報量及び条件付き相互情報量の閾値ε(一例として、0.01)及び多頻度アイテム集合抽出において使用される最小支持度δ(0≦δ≦ε、一例として、0.0005)を含む。さらに、出力となるベイジアンネットワーク構造記述ファイルのファイル名を含んでもよい。
FIG. 3 is a flowchart showing processing by the information processing apparatus 100 according to the present embodiment. When receiving an instruction to execute Bayesian network structure learning, the information processing apparatus 100 starts processing (step 302). The instruction is configured to include predetermined operation parameters including connection information and a data specification description file name for accessing the database 106 that stores data serving as a basis for structure learning. In addition to the above, the operation parameters include a mutual information amount used in structure learning and a conditional mutual information threshold value ε (as an example, 0.01) and a minimum support level δ (used in frequent item set extraction). 0 ≦ δ ≦ ε, for example, 0.0005). Furthermore, the file name of the output Bayesian network structure description file may be included.
情報処理装置100は初期処理を行う。制御部102は、上記動作パラメータが正常であるか否かをチェックして(ステップ304)、エラーがあれば処理を終了し(ステップ320)、正常であればデータ仕様解析部104にデータ仕様を解析させる(ステップ306)。データ仕様解析部104は、データ仕様記述ファイル108を読み取り、データに含まれる各確率変数の名前、確率変数の数、各確率変数が取りうるすべての状態の名前及び状態数を保持する。次に、データ仕様解析部104は、データベース接続情報を用いてデータベース106に接続し、全データの件数を取得してこれを保持する(ステップ308)。ステップ308の後、制御部102は制御を構造学習部110に移す。
The information processing apparatus 100 performs initial processing. The control unit 102 checks whether or not the operation parameter is normal (step 304). If there is an error, the control unit 102 terminates the processing (step 320). If normal, the data specification is sent to the data specification analysis unit 104. Analysis is performed (step 306). The data specification analysis unit 104 reads the data specification description file 108 and stores the names of the random variables, the number of random variables, the names of all the states that can be taken by the random variables, and the number of states. Next, the data specification analysis unit 104 connects to the database 106 using the database connection information, acquires the number of all data, and holds it (step 308). After step 308, the control unit 102 transfers control to the structure learning unit 110.
構造学習部110は木構造準備処理を行い、与えられたデータについて木構造を生成する(ステップ310)。ステップ310の処理を図4においてさらに詳細に示す。
The structure learning unit 110 performs a tree structure preparation process and generates a tree structure for the given data (step 310). The process of step 310 is shown in more detail in FIG.
図4の処理中では、確率変数のペアについて相互情報量(式(2))を計算する必要がある。本実施例は、多頻度アイテム集合抽出アルゴリズムの概念を取り入れることにより、以下に述べるようにここでの相互情報量の計算を高速化する。
4. During the process of FIG. 4, it is necessary to calculate the mutual information (formula (2)) for a pair of random variables. In this embodiment, by introducing the concept of the frequent item set extraction algorithm, the calculation of the mutual information here is accelerated as described below.
多頻度アイテム集合抽出アルゴリズムは、データ中に出現するアイテム集合のうち、支持度(すなわち、あるアイテム集合が出現する同時確率)が最小支持度δ以上となるアイテム集合を高速に抽出するためのデータマイニングアルゴリズムの総称である。多頻度アイテム集合抽出アルゴリズムは、アイテム集合の支持度の逆単調性、すなわち、A及びBをそれぞれアイテム集合としたとき、
ならば、(Aの支持度)≧(Bの支持度)である(つまり、Aが多頻度アイテム集合でなければ、Aを含む集合Bも多頻度アイテム集合ではない)という性質を利用して枝狩りを行うことにより、効率的に多頻度アイテム集合を抽出するアルゴリズムである。
The frequent item set extraction algorithm is a data for rapidly extracting an item set whose support level (that is, the joint probability that a certain item set appears) is equal to or higher than the minimum support level δ among the item sets appearing in the data. A general term for mining algorithms. The frequent item set extraction algorithm is the inverse monotonicity of the support of the item set, that is, when A and B are the item sets,
Then, using the property that (support level of A) ≧ (support level of B) (that is, if A is not a frequent item set, the set B including A is not a frequent item set) It is an algorithm that efficiently extracts frequent itemsets by branching.
多頻度アイテム集合抽出アルゴリズムの一例として、Apriori(Agrawal, R. and Srikant, R.: Fast Algorithms for Mining Association Rules, in Proc. of the 20thInt’l Conference on Very Large Databases, pp. 487-499, Santiago, Chile (1994)参照)がある。Aprioriの擬似コードを図5に示す。
As an example of a multi-frequency item set extraction algorithm, Apriori (Agrawal, R. and Srikant , R .: Fast Algorithms for Mining Association Rules, in Proc. Of the 20 th Int'l Conference on Very Large Databases, pp. 487-499 , Santiago, Chile (1994)). Apriori pseudo code is shown in FIG.
本実施例においては、多頻度アイテム集合抽出アルゴリズムを用いて相互情報量の計算を高速化するために、確率変数とその値の組をアイテム集合と見なし、相互情報量を表す式(2)の右辺を構成する成分のうち、最小支持度δ未満の同時確率変数とその値との組に関連する成分を計算対象から除外する。
In this embodiment, in order to speed up the calculation of mutual information using a frequent item set extraction algorithm, a pair of random variables and their values is regarded as an item set, and the expression (2) expressing the mutual information is Among the components constituting the right side, the component related to the set of the joint random variable having the minimum support degree δ and its value is excluded from the calculation target.
例えば、確率変数Xのとる状態がx1,x2,・・・,xn(すなわち、X={x1,x2,・・・,xn})であり、確率変数Yのとる状態がy1,y2,・・・,ym(すなわち、Y={y1,y2,・・・,ym})であるとする。この場合、互いに異なる確率変数X及びYの相互情報量I(X,Y)は、次の式で表される。
For example, the state taken by the random variable X is x 1 , x 2 ,..., X n (ie, X = {x 1 , x 2 ,..., X n }), and the state taken by the random variable Y Are y 1 , y 2 ,..., Y m (ie, Y = {y 1 , y 2 ,..., Y m }). In this case, mutual information I (X, Y) of different random variables X and Y is expressed by the following equation.
ここで、P(x)もしくはP(y)が最小支持度δ未満であれば、P(x,y)もまた最小支持度δ未満となる。既に述べたとおり、0≦δ≦εであるから、このとき、
が成立する。したがって、P(x)もしくはP(y)が最小支持度δ未満であることを判断することにより、式(4)の右辺において足し合わされる成分(式(5)の左辺)について直接計算しなくとも、当該成分がε未満となることが分かる。
Here, if P (x) or P (y) is less than the minimum support degree δ, P (x, y) also becomes less than the minimum support degree δ. As already mentioned, since 0 ≦ δ ≦ ε, at this time,
Is established. Therefore, by determining that P (x) or P (y) is less than the minimum support degree δ, the component added on the right side of Equation (4) (the left side of Equation (5)) is not directly calculated. In both cases, it can be seen that the component is less than ε.
図4に戻り、クエリー管理部112は、ステップ402において、各確率変数につき、とり得る状態のすべてについて周辺確率を計算する。周辺確率が最小支持度δ以上となるような確率変数と状態の集合(<X,x>、<Y,y>等として表される)をサイズ1の多頻度アイテム集合F1に加える。また、そのときの確率変数及び状態を検索条件キーとし、当該検索条件に該当するデータ件数を値としてクエリー結果キャッシュ部114に記憶する。クエリー管理部112においてこの手続を行うgenFreqItemSet1ルーチンの一例を図6に示す。
Returning to FIG. 4, in step 402, the query management unit 112 calculates peripheral probabilities for all of the possible states for each random variable. A set of random variables and states (represented as <X, x>, <Y, y>, etc.) whose marginal probability is greater than or equal to the minimum support degree δ is added to the frequent item set F 1 of size 1. In addition, the probability variable and state at that time are used as search condition keys, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114. An example of the genFreqItemSet1 routine for performing this procedure in the query management unit 112 is shown in FIG.
クエリー管理部112は、すべての確率変数ペアについて相互情報量を計算し、相互情報量がε以上となる確率変数のペアを確率変数ペア配列に追加する。この際、クエリー管理部112は、上述の多頻度アイテム集合抽出アルゴリズムを利用することにより、計算処理を高速化する。
The query management unit 112 calculates the mutual information amount for all the random variable pairs, and adds a pair of random variables whose mutual information amount is ε or more to the random variable pair array. At this time, the query management unit 112 uses the above-described frequent item set extraction algorithm to speed up the calculation process.
具体的には、クエリー管理部112は、各確率変数ペアを構成する確率変数(ここではX及びYとする)の各々と取る状態との組(例えば、<X,x1>,<Y,y1>)のすべてについて、各組の要素(ここでは、<X,x1>及び<Y,y1>)の両方が上記多頻度アイテム集合F1に含まれる(すなわち、このときの周辺確率がいずれも最小支持度δ以上である)かどうかを判定する(ステップ404)。少なくとも一方が集合F1に含まれない(最小支持度δ未満である)場合(ステップ404の「いいえ」)、式(4)の右辺における
(相互情報量の成分)の計算を行わない。
Specifically, the query management unit 112 sets a pair (for example, <X, x 1 >, <Y, for all y 1>), in each set of elements (here, <X, x 1> and <Y, both of y 1>) is included in the frequent item set F 1 (i.e., the periphery of the case It is determined whether the probabilities are all equal to or greater than the minimum support degree δ (step 404). When at least one is not included in the set F 1 (less than the minimum support degree δ) (“No” in Step 404), the right side of Equation (4)
(Mutual information component) is not calculated.
一方、いずれも含まれる場合(ステップ404の「はい」)、このときの同時確率P(x,y)がδ以上となるかを判定する(ステップ406)。同時確率がδ以上となる場合(ステップ406の「はい」)、このときの相互情報量の成分(式(6))の計算を行う(ステップ408)。さらに、現在の確率変数及び状態の組(例えば、<X,x1>,<Y,y1>)を検索条件キーとし、当該検索条件に該当するデータ件数を値としてクエリー結果キャッシュ部114に記憶する。また、上記現在の確率変数と状態の集合をサイズ2の多頻度アイテム集合F2に加える。
On the other hand, if both are included (“Yes” in step 404), it is determined whether the joint probability P (x, y) at this time is equal to or greater than δ (step 406). If the joint probability is greater than or equal to δ (“Yes” in step 406), the mutual information component (formula (6)) at this time is calculated (step 408). Further, the current set of random variables and states (for example, <X, x 1 >, <Y, y 1 >) is used as a search condition key, and the number of data corresponding to the search condition is set as a value in the query result cache unit 114. Remember. Furthermore, addition of the current set of random variables and state frequent itemsets F 2 size 2.
クエリー管理部112は、ステップ404からステップ408をすべての組について繰り返した後(ステップ410)、これまでに計算した相互情報量の成分を足し合わせることにより、現在着目している確率変数ペアについての相互情報量を得る(ステップ412)。構造学習部110は、相互情報量がε以上の場合、その確率変数ペアを確率変数ペア配列に追加する。ステップ404から412をすべての確率変数ペアについて繰り返し、すべての確率変数ペアについて相互情報量を計算する(ステップ414)。
The query management unit 112 repeats Step 404 to Step 408 for all the sets (Step 410), and then adds the components of the mutual information calculated so far to the random variable pair currently focused on. A mutual information amount is obtained (step 412). When the mutual information amount is ε or more, the structure learning unit 110 adds the random variable pair to the random variable pair array. Steps 404 to 412 are repeated for all random variable pairs, and mutual information is calculated for all random variable pairs (step 414).
続いて、構造学習部110は、確率変数ペア配列内に格納された確率変数ペアを、相互情報量の大きい順にソートする(ステップ416)。そして、相互情報量の大きい確率変数ペアの順に、当該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままか否かをグラフ構造構築部118に問い合わせる(ステップ418)。グラフ構造構築部118は、エッジを追加すると閉路が発生する場合、木構造とならなくなる旨を構造学習部110に通知する(ステップ418の「いいえ」)。一方、木構造のままである旨がグラフ構造構築部118から通知されると(ステップ418の「はい」)、構造学習部110は、現在着目している確率変数ペア間に無向エッジを追加するようグラフ構造構築部118に指示し、確率変数ペア配列から当該確率変数ペアを削除する(ステップ420)。確率変数ペア配列内のすべての確率変数ペアについてステップ418及び420が繰り返される(ステップ422)。
Subsequently, the structure learning unit 110 sorts the random variable pairs stored in the random variable pair array in descending order of mutual information (step 416). Then, the graph structure construction unit 118 is inquired as to whether the graph structure remains a tree structure even if an edge is added between the random variable pairs in the order of the random variable pairs having the larger mutual information amount (step 418). The graph structure construction unit 118 notifies the structure learning unit 110 that a tree structure is not generated when an edge is added to the structure learning unit 110 (“No” in step 418). On the other hand, when the graph structure construction unit 118 notifies that the tree structure is still maintained (“Yes” in step 418), the structure learning unit 110 adds an undirected edge between the current random variable pair of interest. The graph structure construction unit 118 is instructed to delete the random variable pair from the random variable pair array (step 420). Steps 418 and 420 are repeated for all random variable pairs in the random variable pair array (step 422).
上述のように、ステップ310の処理においては、相互情報量がε以上である確率変数ペアの各々について、その確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加するようにして、木構造のグラフ構造を生成する。そして、その際、入力データに含まれる各確率変数の周辺確率を計算し、確率変数ペアを構成するいずれかの確率変数がある状態を取る周辺確率がδ未満である場合には、関連する相互情報量の成分の計算を省略することにより、各確率変数ペアの相互情報量を計算している。
As described above, in the process of step 310, for each random variable pair whose mutual information amount is ε or more, even if an edge is added between the random variable pairs, the graph structure remains a tree structure. A tree-structured graph structure is generated by adding edges. At that time, the marginal probability of each random variable included in the input data is calculated, and if the marginal probability that takes a state with any random variable constituting the random variable pair is less than δ, By omitting the calculation of the information amount component, the mutual information amount of each random variable pair is calculated.
多頻度アイテム集合抽出アルゴリズムの一例としてAprioriを採用し、これを組み込んだ場合の本実施例における相互情報量計算のためのcalcMutualInformationルーチンの一例を図7に示す。
FIG. 7 shows an example of a calcMutualInformation routine for calculating mutual information in this embodiment when Apriori is adopted as an example of the frequent item set extraction algorithm and incorporated.
再び図3に戻り、構造学習部110は、ステップ312においてエッジ増加処理を実行する。構造学習部110は、相互情報量がε以上であるにもかかわらず、無向エッジを追加すると木構造にならなくなるために木構造準備処理において無向エッジが追加されなかった確率変数ペア(すなわち、確率変数ペア配列に残っている確率変数ペア)について、実際にエッジが必要であるか否かを条件付き相互情報量を用いることにより判定し、必要であると判定される場合には無向エッジを追加する。このときのThickening(エッジ増加)ルーチンの一例を図8に示す。
Returning to FIG. 3 again, the structure learning unit 110 executes an edge increasing process in step 312. Although the mutual information amount is ε or more, the structure learning unit 110 does not become a tree structure when an undirected edge is added. , The random variable pair remaining in the random variable pair array) is determined by using conditional mutual information to determine whether or not an edge is actually needed. Add an edge. An example of the thickening (edge increase) routine at this time is shown in FIG.
Thickeningルーチンにおいて実行される主要な処理の詳細を図9に示す。構造学習部110は、相互情報量がε以上であるが無向エッジを有していない各確率変数ペア(すなわち、確率変数ペア配列に残っている確率変数ペア)について、当該ペアを構成する2つの確率変数(例えば、X,Y)ノードの一方のノードを始点とし他方のノードを終点とするパス上に存在しそれら2つの確率変数ノードの何れかに隣接するノードの集合を最終的な条件集合(ConditionSet、C(太字))として設定する(ステップ902)。また、当該最終的な条件集合と同じ確率変数集合を有する候補条件集合C’(太字)を生成する。
FIG. 9 shows details of main processes executed in the thickening routine. The structure learning unit 110 configures the pair for each random variable pair whose mutual information amount is ε or more but does not have an undirected edge (that is, the random variable pair remaining in the random variable pair array). The final condition is a set of nodes that exist on a path that has one node of one random variable (for example, X, Y) node as a start point and the other node as an end point and is adjacent to one of these two random variable nodes. Set as a set (ConditionSet, C (bold)) (step 902). Also, a candidate condition set C ′ (bold) having the same random variable set as the final condition set is generated.
構造学習部110は、上記最終的な条件集合(例えば、{C1,C2,C3,C4,・・・})に含まれるある1つの確率変数(すなわち、XとYとの間のパス上に存在し、X又はYに隣接する確率変数のうちの1つ)について、クエリー管理部112に条件付き相互情報量を計算させる。
The structure learning unit 110 includes one random variable (ie, between X and Y) included in the final condition set (for example, {C 1 , C 2 , C 3 , C 4 ,...}). The query management unit 112 calculates the conditional mutual information for one of the random variables adjacent to X or Y) on the path of (1).
クエリー管理部112は、まず、最終的な条件集合に含まれる確率変数のうちある1つの確率変数(例えば、C1)のみが条件集合に含まれるとした場合について、条件付き相互情報量I(X,Y|C(太字))を計算する(ステップ904)。これを仮に最小の条件付き相互情報量として記憶し、このときの条件集合{C1}を仮に条件付き相互情報量が最小となる条件集合として記憶する。
The query management unit 112 first determines the conditional mutual information I (() when the condition set includes only one random variable (for example, C 1 ) among the random variables included in the final condition set. X, Y | C (bold)) is calculated (step 904). This is temporarily stored as a minimum conditional mutual information amount, and the condition set {C1} at this time is temporarily stored as a conditional set having the minimum conditional mutual information amount.
クエリー管理部112は、次に、条件集合に上記1つの確率変数(ここでは、C1)と別の1つの確率変数(C2、C3、C4・・・のうちの1つ)とが含まれる場合について、条件付き相互情報量を計算する(ステップ906)。
Next, the query management unit 112 includes one random variable (here, C 1 ) and another random variable (one of C 2 , C 3 , C 4 ...) In the condition set. Is included, a conditional mutual information amount is calculated (step 906).
構造学習部110は、計算された条件付き相互情報量がε未満であるかを判定する(ステップ908)。ε未満の場合(ステップ908の「はい」)、このときの確率変数ペア({X,Y})と条件集合(例えば、{C1,C2})との組を切断集合保持部116内のグローバルな切断集合に格納する。そして当該確率変数ペア間にエッジが不要であると判断する(ステップ910)。ε以上である場合(ステップ908の「いいえ」)、計算された条件付き相互情報量が現在の最小の条件付き相互情報量よりも大きいかを判定する(ステップ912)。大きい場合(ステップ912の「はい」)、構造学習部110は、候補条件集合C’(太字)から上記別の1つの確率変数を削除する(ステップ914)。小さい場合(ステップ912の「いいえ」)、構造学習部110は、このときの条件付き相互情報量を最小の相互情報量として記憶し、また、このときの条件集合を条件付き相互情報量が最小となる条件集合として記憶する(ステップ916)。
The structure learning unit 110 determines whether the calculated conditional mutual information is less than ε (step 908). If it is less than ε (“Yes” in step 908), the pair of the random variable pair ({X, Y}) and the condition set (for example, {C 1 , C 2 }) is stored in the cut set holding unit 116. Store in the global disconnected set. Then, it is determined that no edge is required between the random variable pairs (step 910). If it is equal to or greater than ε (“No” in step 908), it is determined whether the calculated conditional mutual information amount is larger than the current minimum conditional mutual information amount (step 912). If larger (“Yes” in step 912), the structure learning unit 110 deletes the other one random variable from the candidate condition set C ′ (bold) (step 914). If it is small (“No” in Step 912), the structure learning unit 110 stores the conditional mutual information amount at this time as the minimum mutual information amount, and the conditional set at this time has the minimum conditional mutual information amount. (Step 916).
続いて、クエリー管理部112は、候補条件集合に残っている確率変数のうち、上記の既に着目した1つの確率変数(上述の例ではC1)以外の確率変数のうちの1つのみが条件集合に含まれる場合について、条件付き相互情報量を計算する(ステップ918)。計算された条件付き相互情報量が現在の最小の条件付き相互情報量より小さい場合、このときの条件付き相互情報量を最小の相互情報量として記憶する。また、このときの条件集合を条件付き相互情報量が最小となる条件集合として記憶する(ステップ920)。
Subsequently, the query management unit 112 determines that only one of the random variables other than the one of the above random variables of interest (C 1 in the above example) among the random variables remaining in the candidate condition set is a condition. For the cases included in the set, a conditional mutual information amount is calculated (step 918). If the calculated conditional mutual information amount is smaller than the current minimum conditional mutual information amount, the conditional mutual information amount at this time is stored as the minimum mutual information amount. The condition set at this time is stored as a condition set that minimizes the amount of conditional mutual information (step 920).
クエリー管理部112は、現在着目している1つの確率変数と、候補条件集合に残っているその他の確率変数のうちの1つとが条件集合に含まれる場合について、条件付き相互情報量を計算する(ステップ922)。構造学習部110は、計算された条件付き相互情報量がε未満の場合(ステップ924の「はい」)、このときの確率変数ペアと条件集合との組を切断集合保持部116内のグローバルな切断集合に記憶する。そして、当該確率変数ペア間にエッジが不要であると判断する(ステップ910)。計算された条件付き相互情報量がε以上の場合(ステップ924の「いいえ」)、その値がステップ918で計算された値より大きいかを判定する(ステップ926)。大きい場合(ステップ926の「はい」)、候補条件集合から上記その他の1つの確率変数を削除する(ステップ928)。小さい場合(ステップ926の「いいえ」)、このときの条件付き相互情報量を最小の相互情報量として記憶し、このときの条件集合を条件付き相互情報量が最小となる条件集合として記憶する(ステップ930)。
The query management unit 112 calculates a conditional mutual information amount in the case where one random variable currently focused on and one of the other random variables remaining in the candidate condition set are included in the condition set. (Step 922). When the calculated conditional mutual information amount is less than ε (“Yes” in step 924), the structure learning unit 110 determines the global variable in the cut set holding unit 116 as a combination of the random variable pair and the condition set at this time. Remember to cut set. Then, it is determined that no edge is required between the random variable pairs (step 910). If the calculated conditional mutual information amount is equal to or larger than ε (“No” in step 924), it is determined whether the value is larger than the value calculated in step 918 (step 926). If larger (“Yes” in step 926), the other one random variable is deleted from the candidate condition set (step 928). If it is small (“No” in step 926), the conditional mutual information amount at this time is stored as the minimum mutual information amount, and the condition set at this time is stored as a conditional set that minimizes the conditional mutual information amount ( Step 930).
処理は図10に続く。ステップ918から930を候補条件集合に残っているすべての確率変数について繰り返した後(ステップ1002)、クエリー管理部112は、候補条件集合に残っているすべての確率変数を条件集合として条件付き相互情報量を計算する(ステップ1004)。計算された条件付き相互情報量がε未満であるかを判定する(ステップ1006)。ε未満である場合(ステップ1006の「はい」)、構造学習部110は、このときの確率変数ペアと条件集合との組を切断集合保持部116内のグローバルな切断集合に格納し(ステップ1008)、当該確率変数ペア間にエッジが不要であると判断する。ε以上である場合(ステップ1006の「いいえ」)、構造学習部110は、当該確率変数ペア間にエッジが必要であると判断する(ステップ1010)。
Processing continues in FIG. After repeating Steps 918 to 930 for all the random variables remaining in the candidate condition set (Step 1002), the query management unit 112 uses all the random variables remaining in the candidate condition set as the condition set to perform conditional mutual information. The amount is calculated (step 1004). It is determined whether the calculated conditional mutual information is less than ε (step 1006). If it is less than ε (“Yes” in step 1006), the structure learning unit 110 stores the pair of the random variable pair and the condition set at this time in the global cut set in the cut set holding unit 116 (step 1008). ), It is determined that no edge is required between the pair of random variables. If it is equal to or greater than ε (“No” in step 1006), the structure learning unit 110 determines that an edge is necessary between the random variable pair (step 1010).
図9及び図10の例においては、条件集合のサイズを1から2へと大きくしながら条件付き相互情報量の計算を行い、それでもなおε未満となる条件付き相互情報量が得られない場合には、候補集合に残っているすべての確率変数を条件集合として条件付き相互情報量を計算し、これがε未満となるかを判断した。しかし、同様の考え方により、条件集合のサイズをさらに3、4、と大きくしていき、最終的な条件集合のサイズ以下のサイズまでについて図9のように処理を行ってもよい。
In the examples of FIGS. 9 and 10, the conditional mutual information is calculated while increasing the size of the condition set from 1 to 2, and the conditional mutual information that is still less than ε cannot be obtained. Calculated conditional mutual information using all the random variables remaining in the candidate set as a condition set, and determined whether or not this was less than ε. However, with the same concept, the size of the condition set may be further increased to 3, 4 and processing up to a size equal to or smaller than the final condition set size may be performed as shown in FIG.
上述のように、ステップ312においては、相互情報量がε以上でありながらステップ310においてエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にこれを追加する。その際、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値がε未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加する。より具体的には、該2つの確率変数ノード間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合C(太字)についての条件付き相互情報量を、条件集合C(太字)のサイズが1である場合から上記最終的な条件集合のサイズとなる場合までサイズを大きくしながら計算していき、その過程でε未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加している。そして、条件付き相互情報量の計算においては、2つの確率変数の状態とそれぞれの確率変数に対応する状態集合とについての同時確率分布がδ未満となる場合には、関連する成分の計算を省略する。
As described above, in step 312, for each random variable pair in which the mutual information amount is ε or more but no edge is added in step 310, an edge is added if necessary. At that time, with respect to two random variable nodes constituting the random variable pair, a condition set is a set of random variables included in a set of nodes on a path between them and adjacent to either of the two random variable nodes. If a pair having a value less than ε is found, an edge is not added between the two random variables, otherwise an edge is added. More specifically, a set of nodes on the path between the two random variable nodes and adjacent to one of the two random variable nodes is a final condition set, and the two random variable nodes and the condition set The conditional mutual information about C (bold) is calculated while increasing the size from the case where the size of the condition set C (bold) is 1 to the size of the final condition set. If a conditional mutual information amount that is less than ε is obtained in the process, an edge is not added between the two random variables, and an edge is added if it is determined that it is not necessary. In the calculation of conditional mutual information, if the joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is less than δ, the calculation of the related components is omitted. To do.
図7のThickeningルーチンにおいて呼び出され、図9及び図10の処理に用いられるedgeNeeded_Hルーチンの一例を図11に示し、edgeNeeded_Hルーチンにおいて呼び出されるedgeNeededBodyルーチンの一例を図12A及び図12Bに示す。図12Aのルーチンは、条件集合C(太字)のサイズを1から2へと順に大きくして条件付き相互情報量の計算をするように構成されている。図12Bのルーチンは、条件集合のサイズを1から最終的な条件集合のサイズまで順に大きくしながら計算を行うように構成されている。
FIG. 11 shows an example of the edgeNeeded_H routine called in the thickening routine of FIG. 7 and used in the processing of FIGS. 9 and 10, and FIG. 12A and FIG. 12B show an example of the edgeNeededBody routine called in the edgeNeeded_H routine. The routine in FIG. 12A is configured to calculate the conditional mutual information by increasing the size of the condition set C (bold) in order from 1 to 2. The routine of FIG. 12B is configured to perform calculation while increasing the size of the condition set from 1 to the size of the final condition set.
図9のステップ904、906、918及び922並びに図10のステップ1004においては、様々な確率変数の組を条件集合として条件付き相互情報量を計算する。本実施例においては、その過程において、条件集合のサイズが小さい(サイズが1)場合の条件付き相互情報量を計算し、次いで条件集合のサイズを大きくしながら条件付き相互情報量を計算している。そして、条件付き相互情報量が閾値ε未満となる条件集合が見つかるまで計算を繰り返し、見つかった場合には着目している確率変数ペア間にエッジは不要であると判断し、見つからなかった場合にはエッジが必要であると判断している。
In steps 904, 906, 918, and 922 of FIG. 9 and step 1004 of FIG. 10, conditional mutual information is calculated using a set of various random variables as a condition set. In this embodiment, in the process, the conditional mutual information when the size of the conditional set is small (size is 1) is calculated, and then the conditional mutual information is calculated while increasing the size of the conditional set. Yes. Then, the calculation is repeated until a condition set in which the conditional mutual information amount is less than the threshold ε is found, and if it is found, it is determined that an edge is not necessary between the random variable pair of interest, and if it is not found Determines that an edge is necessary.
本実施例によれば、図9及び図10において実行される条件付き相互情報量の計算においても、多頻度アイテム集合抽出アルゴリズムを用いて計算を高速化するために、確率変数とその値の組をアイテム集合と見なし、最小支持度δ未満の同時確率変数とその値の組に関連する成分を計算対象から除外する。例えば、X={x1,x2,・・・,xn}及びY={y1,y2,・・・,ym}である場合、互いに異なる確率変数X及びYの条件付き相互情報量I(X,Y|C(太字))は、次の式で表される。
According to the present embodiment, even in the calculation of the conditional mutual information executed in FIGS. 9 and 10, in order to speed up the calculation using the frequent item set extraction algorithm, a combination of a random variable and its value is used. Are considered as an item set, and components related to the set of joint random variables and their values less than the minimum support degree δ are excluded from the calculation target. For example, X = {x 1, x 2, ···, x n} and Y = {y 1, y 2 , ···, y m} if it is, the mutual conditions of different random variables X and Y together The information amount I (X, Y | C (bold)) is expressed by the following equation.
ここでP(x,y,c(太字))が最小支持度δ未満であれば、式(7)の右辺を構成する成分もまたδ未満となる。既に述べたとおり、0≦δ≦εであるから、このとき、当該成分について、
が成立する。したがって、式(7)の右辺の各成分を直接計算することなく、その値がε未満となることが分かる。
Here, if P (x, y, c (bold)) is less than the minimum support degree δ, the component constituting the right side of Equation (7) is also less than δ. As already described, since 0 ≦ δ ≦ ε, at this time, for the component,
Is established. Therefore, it can be seen that the value is less than ε without directly calculating each component on the right side of Equation (7).
このように多頻度アイテム集合抽出アルゴリズムを用いて処理を高速化した本実施例における条件付き相互情報量の計算過程を以下に説明する。この計算においては、与えられた条件集合(式(7)におけるC(太字))に含まれるすべての確率変数がとり得る状態パターンの集合(Q)中のすべての状態パターン(q)について、以下の処理を実行する。
The calculation process of conditional mutual information in the present embodiment in which the processing speed is increased using the frequent item set extraction algorithm will be described below. In this calculation, for all state patterns (q) in a set (Q) of state patterns that can be taken by all random variables included in a given condition set (C (bold) in Expression (7)), Execute the process.
与えられた条件集合に含まれる確率変数の数(|q|)が1の場合の処理のフローを図13に示す。クエリー管理部112は、P(c(太字))(c(太字)は確率変数に対応する状態集合)を計算する。この値がδ未満となるかが判定され(ステップ1102)、δ未満となる場合(ステップ1102の「はい」)、クエリー管理部112は、条件付き相互情報量の成分(式(7)の右辺を構成する成分)の計算を行わない。δ以上となる場合(ステップ1102の「いいえ」)、P(x|c(太字))が0となるかを判断する(ステップ1104)。0となる場合(ステップ1104の「はい」)、条件付き相互情報量の成分の計算を行わない。0でない場合(ステップ1104の「いいえ」)、P(y|c(太字))が0となるかを判断する(ステップ1106)。0となる場合(ステップ1106の「はい」)、条件付き相互情報量の成分の計算を行わない。0でない場合(ステップ1106の「いいえ」)、構造学習部110は、現在の確率変数ペアを構成する2つの確率変数と現在の条件集合とからなる集合から得られる、当該集合よりサイズを1だけ小さくした部分集合のすべてを考慮し(ステップ1108)、そのすべての部分集合が多頻度アイテム集合(F1、F2、・・・)に含まれるかを判断する(ステップ1110)。含まれない場合(ステップ1110の「いいえ」)、クエリー管理部112は、条件付き相互情報量の成分の計算を行わない。含まれる場合(ステップ1110の「はい」)、P(x,y,c(太字))がδ未満となるかを判断する(ステップ1112)。δ未満となる場合(ステップ1112の「はい」)、条件付き相互情報量の成分の計算を行わない。δ以上となる場合(ステップ1112の「いいえ」)、クエリー管理部112は、現在の確率変数及び状態の組について、条件付き相互情報量の成分を計算する(ステップ1114)。
FIG. 13 shows a processing flow when the number (| q |) of random variables included in a given condition set is 1. The query management unit 112 calculates P (c (bold)) (c (bold) is a state set corresponding to a random variable). It is determined whether this value is less than δ (step 1102). If it is less than δ (“Yes” in step 1102), the query management unit 112 determines whether the conditional mutual information component (the right side of Expression (7)) Is not calculated. If it is equal to or greater than δ (“No” in step 1102), it is determined whether P (x | c (bold)) is 0 (step 1104). When 0 (“Yes” in step 1104), the component of the conditional mutual information is not calculated. If it is not 0 (“No” in step 1104), it is determined whether P (y | c (bold)) is 0 (step 1106). When it is 0 (“Yes” in step 1106), the component of the conditional mutual information is not calculated. When it is not 0 (“No” in Step 1106), the structure learning unit 110 obtains a size of only 1 from the set obtained from the set of two random variables constituting the current random variable pair and the current condition set. All of the reduced subsets are considered (step 1108), and it is determined whether all of the subsets are included in the frequent item set (F 1 , F 2 ,...) (Step 1110). If not included (“No” in Step 1110), the query management unit 112 does not calculate the component of the conditional mutual information. If included (“Yes” in step 1110), it is determined whether P (x, y, c (bold)) is less than δ (step 1112). If it is less than δ (“Yes” in step 1112), the conditional mutual information component is not calculated. If it is equal to or greater than δ (“No” in step 1112), the query management unit 112 calculates a component of conditional mutual information about the current set of random variables and states (step 1114).
現在の確率変数ペアのうち一方の確率変数(例えば、X)の状態を固定し、他方の確率変数(例えば、Y)のとり得るすべての状態についてステップ1106から1114を繰り返す(ステップ1116)。さらに、固定していた確率変数(X)のとり得るすべての状態についてステップ1104から1116を繰り返した後(ステップ1118)、これまでに計算した成分を足し合わせて条件付き相互情報量を得る(ステップ1120)。
The state of one random variable (for example, X) in the current random variable pair is fixed, and steps 1106 to 1114 are repeated for all possible states of the other random variable (for example, Y) (step 1116). Further, after repeating steps 1104 to 1116 for all possible states of the fixed random variable (X) (step 1118), the components calculated so far are added to obtain a conditional mutual information amount (step 1120).
与えられた条件集合に含まれる確率変数の数(|q|)が2以上の場合の処理のフローを図14に示す。まず、条件集合から|q|―1個の変数を含むすべての組み合わせを生成する(ステップ1402)。クエリー管理部112は、すべての組み合わせについて、その組み合わせが検索条件キーとしてクエリー結果キャッシュ部114に格納されているか問い合わせる(ステップ1404)。いずれかの組み合わせが格納されていない場合(ステップ1404の「いいえ」)、以下の処理を行わず終了する(ステップ1406)。
FIG. 14 shows a processing flow when the number of random variables (| q |) included in a given condition set is 2 or more. First, all combinations including | q | −1 variable are generated from the condition set (step 1402). The query management unit 112 inquires whether all combinations are stored in the query result cache unit 114 as search condition keys (step 1404). If any combination is not stored (“No” in step 1404), the following processing is not performed and the process is terminated (step 1406).
すべての組み合わせがクエリー結果キャッシュ部114に格納されている場合(ステップ1404の「はい」)、P(c(太字))(c(太字)はそれぞれの確率変数に対応する状態値集合)がδ未満となるかを判断する(ステップ1408)。δ未満である場合(ステップ1408の「はい」)、クエリー管理部112に条件付き相互情報量の成分の計算を行わせない。δ以上である場合(ステップ1408の「いいえ」)、このときの確率変数を検索条件キーとし、検索条件に該当するデータ件数を値としてクエリー結果キャッシュ部114に記憶する(ステップ1410)。また、確率変数と状態の集合を、確率変数の数nに等しいサイズの多頻度アイテム集合Fnに追加する。
When all the combinations are stored in the query result cache unit 114 (“Yes” in step 1404), P (c (bold)) (c (bold) is a state value set corresponding to each random variable) is δ. It is determined whether or not it is less than (step 1408). If it is less than δ (“Yes” in step 1408), the query management unit 112 is not allowed to calculate the conditional mutual information component. If it is greater than or equal to δ (“No” in step 1408), the probability variable at this time is used as a search condition key, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114 (step 1410). Also, the set of random variables and states is added to the frequent item set F n having a size equal to the number n of random variables.
次に、P(x|c(太字))が0となるかを判断する(ステップ1412)。0である場合(ステップ1412の「はい」)、条件付き相互情報量の成分を計算しない。0でない場合(ステップ1412の「いいえ」)、P(y|c(太字))が0となるかを判断する(ステップ1414)。0である場合(ステップ1414の「はい」)、条件付き相互情報量の成分を計算しない。
Next, it is determined whether P (x | c (bold)) becomes 0 (step 1412). If it is 0 (“Yes” in step 1412), the conditional mutual information component is not calculated. If it is not 0 (“No” in step 1412), it is determined whether P (y | c (bold)) is 0 (step 1414). If it is 0 (“Yes” in step 1414), the conditional mutual information component is not calculated.
0でない場合(ステップ1414の「いいえ」)、現在の確率変数ペアを構成する2つの確率変数と現在の条件集合とからなる集合から得られる、その集合よりサイズを1だけ小さくした部分集合のすべてを考慮し(ステップ1416)、そのすべての部分集合が多頻度アイテム集合(F1、F2、・・・)に含まれるかを判断する(ステップ1418)。含まれない場合(ステップ1418の「いいえ」)、条件付き相互情報量の成分を計算しない。含まれる場合(ステップ1418の「はい」)、P(x,y,c(太字))がδ未満となるかを判断する(ステップ1420)。δ未満の場合(ステップ1420の「はい」)、条件付き相互情報量の成分を計算しない。δ以上の場合(ステップ1420の「いいえ」)、このときの確率変数を検索条件キーとし、検索条件に該当するデータ件数を値としてクエリー結果キャッシュ部114に記憶する(ステップ1422)。また、確率変数と状態の集合を、確率変数の数nに等しいサイズの多頻度アイテム集合Fnに追加する。
If it is not 0 (“No” in step 1414), all subsets obtained by a set consisting of two random variables constituting the current random variable pair and the current condition set and having a size smaller by 1 than that set. Is considered (step 1416), and it is determined whether all the subsets are included in the frequent item set (F 1 , F 2 ,...) (Step 1418). If not included (“No” in step 1418), the conditional mutual information component is not calculated. If included (“Yes” in step 1418), it is determined whether P (x, y, c (bold)) is less than δ (step 1420). If it is less than δ (“Yes” in step 1420), the conditional mutual information component is not calculated. If it is equal to or greater than δ (“No” in step 1420), the probability variable at this time is used as a search condition key, and the number of data corresponding to the search condition is stored as a value in the query result cache unit 114 (step 1422). Also, the set of random variables and states is added to the frequent item set F n having a size equal to the number n of random variables.
次に、現在の確率変数及び状態の組について、条件付き相互情報量の成分を計算する(ステップ1424)。さらに、現在の確率変数ペアのうち一方の確率変数(X)の状態を固定し、他方の確率変数(Y)のとり得るすべての状態についてステップ1414から1424を繰り返す(ステップ1426)。また、固定していた確率変数(X)のとり得るすべての状態についてステップ1412から1426を繰り返す(ステップ1428)。最後に、これまでに計算した条件付き相互情報量の成分をすべて足し合わせて、条件付き相互情報量を得る(ステップ1430)。
Next, a conditional mutual information component is calculated for the current set of random variables and states (step 1424). Further, the state of one random variable (X) in the current random variable pair is fixed, and Steps 1414 to 1424 are repeated for all possible states of the other random variable (Y) (Step 1426). Steps 1412 to 1426 are repeated for all possible states of the fixed random variable (X) (step 1428). Finally, all the components of the conditional mutual information calculated so far are added to obtain the conditional mutual information (step 1430).
図13及び図14の処理に用いられるcalcConditionalMIルーチン及びhaveValidCandidateルーチンの一例をそれぞれ図15及び図16に示す。
An example of the calcConditionalMI routine and the haveValidCandidate routine used in the processing of FIGS. 13 and 14 is shown in FIGS. 15 and 16, respectively.
図3に戻り、情報処理装置100は、ステップ314においてエッジ削減処理を行う。
詳細な処理を図17に示す。 Returning to FIG. 3, the information processing apparatus 100 performs edge reduction processing in step 314.
Detailed processing is shown in FIG.
詳細な処理を図17に示す。 Returning to FIG. 3, the information processing apparatus 100 performs edge reduction processing in step 314.
Detailed processing is shown in FIG.
構造学習部110は、無向エッジを有する各確率変数ペアについて、当該確率変数ペア間に別のパスが存在するかをグラフ構造構築部118に問い合わせる(ステップ1702)。別のパスがない旨の通知をグラフ構造構築部118から受けると(ステップ1702の「いいえ」)、別の確率変数ペアについてのステップ1702の処理に移る。別のパスがある旨の通知をグラフ構造構築部118から受けると(ステップ1702の「はい」)、構造学習部110は、当該確率変数ペア間の無向エッジを一時的に削除するようグラフ構造構築部に指示する(ステップ1704)。そして、当該確率変数ペアについて図9及び図10に示した処理を実行し、この確率変数ペア間にエッジが必要であるか否かを判断する(ステップ1706)。エッジが必要である場合(ステップ1706の「はい」)、構造学習部110は、一時的に削除した無向エッジを当該確率変数ペア間に再び追加するよう指示する(ステップ1708)。エッジが不要である場合(ステップ1706の「いいえ」)、構造学習部110はそのような指示を行わず、当該無向エッジは削除されたままとなる。エッジ増加処理312を終えた時点で無向エッジを有しているすべての確率変数ペアについてステップ1702から1708の処理を繰り返す(ステップ1710)。
The structure learning unit 110 inquires of the graph structure construction unit 118 whether there is another path between the random variable pairs for each random variable pair having an undirected edge (step 1702). When a notification that there is no other path is received from the graph structure construction unit 118 (“No” in Step 1702), the process proceeds to Step 1702 for another random variable pair. When a notification that there is another path is received from the graph structure construction unit 118 (“Yes” in step 1702), the structure learning unit 110 temporarily deletes the undirected edge between the random variable pairs. The construction unit is instructed (step 1704). Then, the processing shown in FIGS. 9 and 10 is executed for the random variable pair, and it is determined whether or not an edge is necessary between the random variable pairs (step 1706). When an edge is necessary (“Yes” in step 1706), the structure learning unit 110 instructs to add the undirected edge temporarily deleted again between the random variable pairs (step 1708). When the edge is not necessary (“No” in Step 1706), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. Steps 1702 to 1708 are repeated for all random variable pairs having undirected edges when the edge increasing process 312 is completed (step 1710).
続いて、構造学習部110は、現時点で無向エッジを有している各確率変数ペアについて、当該ペアを構成する少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有するかについてグラフ構造構築部118に問い合わせる(ステップ1712)。ペアを構成するいずれの確率変数ノードもペアの相手以外に3つ以上の隣接ノードを有しない旨の通知をグラフ構造構築部118から受けると(ステップ1712の「いいえ」)、構造学習部110は別の確率変数ペアについてのステップ1712の処理に移る。ペアを構成する少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する旨の通知をグラフ構造構築部118から受けると(ステップ1712の「はい」)、構造学習部110は、当該確率変数ペア間の無向エッジを一時的に削除するようグラフ構造構築部に指示する(ステップ1714)。そして、当該確率変数ペアを構成する2つの確率変数ノード間のパス上にある、それら2つの確率変数ノードのいずれかに隣接するノードと、さらに当該隣接ノードに隣接するノードとを含む集合を、最終的な条件集合に設定する(ステップ1716)。そして、当該確率変数ペア及び最終的な条件集合について、図9のステップ904以降の処理及び図10の処理が実行され、この確率変数ペア間にエッジが必要であるか否かが判断される(ステップ1718)。エッジが必要である場合(ステップ1718の「はい」)、構造学習部110は、一時的に削除した無向エッジを当該確率変数ペア間に再び追加するよう指示する(ステップ1720)。エッジが不要である場合(ステップ1718の「いいえ」)、構造学習部110はそのような指示を行わず、当該無向エッジは削除されたままとなる。ステップ1702から1710の処理を終えた時点で無向エッジを有しているすべての確率変数ペアについてステップ1712から1720の処理を繰り返す(ステップ1722)。
Subsequently, the structure learning unit 110 has, for each random variable pair having an undirected edge at the present time, at least one random variable node constituting the pair has three or more adjacent nodes other than the other node. The graph structure construction unit 118 is inquired about this (step 1712). When the graph structure construction unit 118 receives a notification from the graph structure construction unit 118 that any of the random variable nodes constituting the pair does not have three or more adjacent nodes other than the pair counterpart, the structure learning unit 110 The process proceeds to step 1712 for another random variable pair. Upon receiving notification from the graph structure construction unit 118 that at least one random variable node constituting the pair has three or more adjacent nodes other than the other node (“Yes” in step 1712), the structure learning unit 110 Then, the graph structure construction unit is instructed to temporarily delete the undirected edge between the random variable pairs (step 1714). Then, a set including a node adjacent to one of the two random variable nodes and a node adjacent to the adjacent node on the path between the two random variable nodes constituting the random variable pair, The final condition set is set (step 1716). Then, the processing after step 904 in FIG. 9 and the processing in FIG. 10 are executed for the random variable pair and the final condition set, and it is determined whether or not an edge is necessary between the random variable pairs ( Step 1718). When an edge is necessary (“Yes” in Step 1718), the structure learning unit 110 instructs to add the undirected edge temporarily deleted again between the random variable pairs (Step 1720). If the edge is not necessary (“No” in step 1718), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. When the processing of steps 1702 to 1710 is completed, the processing of steps 1712 to 1720 is repeated for all random variable pairs having undirected edges (step 1722).
上述のように、ステップ314においては、ステップ312による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、必要である場合にエッジを追加する。その際、無向エッジを有する確率変数ペアについて、該無向エッジ以外のパスが存在する場合、該無向エッジを一時的に削除し、該確率変数ペアについて、ステップ312と同様にして、必要である場合に上記一時的に削除した無向エッジを追加する。さらに、無向エッジを有する確率変数ペアの少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する場合、該無向エッジを一時的に削除する。そして、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノード及び該隣接するノードにさらに隣接するノードを含む集合を最終的な条件集合として、ステップ312の場合と同様にして、必要である場合に上記一時的に削除した無向エッジを追加している。
As described above, in step 314, it is determined whether or not an edge is necessary for each random variable pair having an edge after the processing in step 312, and an edge is added if necessary. At that time, if there is a path other than the undirected edge for a random variable pair having an undirected edge, the undirected edge is temporarily deleted, and the random variable pair is necessary in the same manner as in step 312. If it is, the undirected edge temporarily deleted is added. Furthermore, when at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted. Then, with respect to two random variable nodes constituting the random variable pair, a set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on the path between the two final random variable nodes is finalized. As a necessary condition set, the undirected edge temporarily deleted is added when necessary as in the case of step 312.
Thinning(エッジ削減)ルーチンの一例を図18に示し、当該ルーチンで呼び出されるedgeNeededルーチンの一例を図19に示す。
An example of the Thinning (edge reduction) routine is shown in FIG. 18, and an example of the edgeNeeded routine called in the routine is shown in FIG.
図3に戻り、情報処理装置100は、エッジ削減処理314までの作業によりグラフ構造構築部118に生成されているグラフ構造に含まれる無向エッジにつき、その方向付け処理を実行する(ステップ316)。詳細な処理を図20に示す。
Returning to FIG. 3, the information processing apparatus 100 executes an orientation process for an undirected edge included in the graph structure generated in the graph structure construction unit 118 through the operations up to the edge reduction process 314 (step 316). . Detailed processing is shown in FIG.
XとYとの間及びYとZとの間にそれぞれ直接接続する無向エッジが存在し、且つ、XとZとの間には直接接続の無向エッジが存在しないような3つの確率変数の組(ここでは、X、Y及びZ)に着目する。構造学習部110は、このような確率変数の組について、切断集合保持部116に問い合わせ、1){X,Z}をレコードとする要素(例えば、<{X,Z},C>)がグローバルな切断集合に含まれており且つ当該要素に{X,Z}とともに含まれる条件部分の変数集合(上記例ではC)にYが含まれないか、又は2)({X,Z}をレコードとする要素(例えば、<{X,Z},C>)がグローバルな切断集合に含まれないか、のいずれかの条件が満たされるかを問い合わせる(ステップ2002)。条件が満たされる場合(ステップ2002の「はい」)、構造学習部110は、X及びZがYの親となるよう当該無向エッジに方向付けを行うよう、グラフ構造構築部118に指示する(ステップ2004)。条件が満たされない場合(ステップ2002の「いいえ」)、上記の関係を満たす別の3つの確率変数の組についてステップ2002の処理を行う。
Three random variables such that there is an undirected edge directly connected between X and Y and between Y and Z, and there is no directly connected undirected edge between X and Z. Pay attention to the set (here, X, Y and Z). The structure learning unit 110 inquires of the cut set holding unit 116 for such a set of random variables, and 1) an element having {X, Z} as a record (for example, <{X, Z}, C>) is global. Or Y in the variable set (C in the above example) of the conditional part that is included in the complete cutting set and included in the element together with {X, Z}, or 2) ({X, Z} is recorded Is inquired whether any of the conditions (for example, <{X, Z}, C>) is not included in the global cutting set is satisfied (step 2002). (Yes in 2002)), the structure learning unit 110 instructs the graph structure building unit 118 to direct the undirected edge so that X and Z are parents of Y (step 2004). If not ( Step "No" in 2002), the process of step 2002 for another three random variables set to satisfy the above relationship.
上記の関係を満たす確率変数の組のすべてについてステップ2002及び2004の処理を実行した後(ステップ2006)、構造学習部110は、現在の頂点集合に含まれる3つの確率変数の組(ここでは、X、Y及びZ)に着目する。構造学習部110は、グラフ構造構築部118に問い合わせて、1)XがYの親であること、2)YとZが隣接していること、3)XとZが隣接していないこと、及び4)YとZとの間のエッジが無向エッジであること、のすべての条件が満たされているかについて問い合わせる(ステップ2008)。条件が満たされる場合(ステップ2008の「はい」)、構造学習部110は、YがZの親となるようにエッジの方向付けを行うよう、グラフ構造構築部118に指示する(ステップ2010)。条件が満たされない場合(ステップ2008の「いいえ」)、頂点集合に含まれる別の3つの確率変数の組についてステップ2008の処理を実行する。頂点集合に含まれる3つの確率変数の組のすべてについてステップ2008及び2010の処理を行う(ステップ2012)。
After executing the processing of Steps 2002 and 2004 for all of the random variable sets satisfying the above relationship (Step 2006), the structure learning unit 110 sets the three random variable sets (here, Focus on X, Y and Z). The structure learning unit 110 inquires of the graph structure construction unit 118, 1) X is a parent of Y, 2) Y and Z are adjacent, and 3) X and Z are not adjacent. And 4) An inquiry is made as to whether all the conditions that the edge between Y and Z is an undirected edge are satisfied (step 2008). If the condition is satisfied (“Yes” in step 2008), the structure learning unit 110 instructs the graph structure construction unit 118 to direct the edge so that Y becomes the parent of Z (step 2010). If the condition is not satisfied (“No” in step 2008), the process in step 2008 is executed for another set of three random variables included in the vertex set. Steps 2008 and 2010 are performed for all of the three random variable sets included in the vertex set (step 2012).
さらに、構造学習部110は、現時点で無向エッジを有する確率変数ペアについて、当該確率変数ペア間に有効なパスが存在するかについてグラフ構造構築部118に問い合わせる(ステップ2014)。存在する場合(ステップ2014の「はい」)、構造学習部110は、XがYの親となるようにエッジの方向付けを行うよう、グラフ構造構築部118に指示する(ステップ2016)。すべての無向エッジについてステップ2014及び2016の処理を行う(ステップ2018)。
Further, the structure learning unit 110 inquires of the graph structure construction unit 118 about whether there is a valid path between the random variable pairs for the random variable pair having an undirected edge at the present time (step 2014). If it exists (“Yes” in step 2014), the structure learning unit 110 instructs the graph structure construction unit 118 to direct the edge so that X becomes the parent of Y (step 2016). Steps 2014 and 2016 are performed for all undirected edges (step 2018).
エッジの方向付けに用いられるorientEdgeルーチンの一例を図21に示す。
An example of the orientEdge routine used for edge orientation is shown in FIG.
図3のステップ316の実行後にグラフ構造構築部118が保持しているグラフ構造は、本実施例の一連の処理を経て学習されたベイジアンネットワーク構造を表している。情報処理装置100は、これをベイジアンネットワーク構造記述ファイル120として出力し(ステップ318)、処理を完了する(ステップ320)。
The graph structure held by the graph structure construction unit 118 after the execution of step 316 in FIG. 3 represents a Bayesian network structure learned through a series of processes of the present embodiment. The information processing apparatus 100 outputs this as the Bayesian network structure description file 120 (step 318) and completes the process (step 320).
ここで説明された情報処理装置100は図1に示す複数の構成要素からなるが、このよ
うな構成は一例にすぎない。すなわち、本発明の学習装置は、制御部102、データ仕様解析部104、構造学習部110、クエリー管理部112、クエリー結果キャッシュ部114、切断集合保持部116及びグラフ構造構築部118のうち複数の機能を単一の構成要素において実行するように構成されてもよい。また、これらの機能すべてを単一の構成要素(例えば、コンピュータのプロセッサ)において実行してもよい。情報処理装置100は、データベース106を含んでもよい。 The information processing apparatus 100 described here includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the learning device of the present invention includes a plurality of control units 102, data specification analysis unit 104, structure learning unit 110, query management unit 112, query result cache unit 114, disconnected set holding unit 116, and graph structure construction unit 118. The functions may be configured to execute on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor). The information processing apparatus 100 may include a database 106.
うな構成は一例にすぎない。すなわち、本発明の学習装置は、制御部102、データ仕様解析部104、構造学習部110、クエリー管理部112、クエリー結果キャッシュ部114、切断集合保持部116及びグラフ構造構築部118のうち複数の機能を単一の構成要素において実行するように構成されてもよい。また、これらの機能すべてを単一の構成要素(例えば、コンピュータのプロセッサ)において実行してもよい。情報処理装置100は、データベース106を含んでもよい。 The information processing apparatus 100 described here includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the learning device of the present invention includes a plurality of control units 102, data specification analysis unit 104, structure learning unit 110, query management unit 112, query result cache unit 114, disconnected set holding unit 116, and graph structure construction unit 118. The functions may be configured to execute on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor). The information processing apparatus 100 may include a database 106.
上述の実施例において、本発明は情報処理装置100として実施されるものとして説明された。しかし、本発明は、コンピュータを図1に示す構成要素として動作させるプログラムとして実現することができる。また、本発明は、図3に記載の各ステップをコンピュータに実行させるプログラムとして実現することも可能である。
In the above-described embodiments, the present invention has been described as being implemented as the information processing apparatus 100. However, the present invention can be realized as a program that causes a computer to operate as a component shown in FIG. In addition, the present invention can be realized as a program that causes a computer to execute the steps illustrated in FIG.
本実施例は、制約ベースの学習アルゴリズムに多頻度アイテム集合抽出アルゴリズムを組み合わせた新しいアルゴリズムを用いてベイジアンネットワーク構造学習を行うことを特徴とする。本実施例においては、具体的な説明のため、特定の制約ベース学習アルゴリズム及び多頻度アイテム集合抽出アルゴリズムを利用して本発明の技術的思想を説明した。しかし、本発明の技術的思想は、その他の制約ベース学習アルゴリズムや多頻度アイテム集合抽出アルゴリズムを用いても実現できるものである。
This embodiment is characterized in that Bayesian network structure learning is performed using a new algorithm in which a frequent item set extraction algorithm is combined with a constraint-based learning algorithm. In this embodiment, the technical idea of the present invention has been described using a specific constraint-based learning algorithm and a frequent item set extraction algorithm for specific explanation. However, the technical idea of the present invention can be realized using other constraint-based learning algorithms and frequent item set extraction algorithms.
本実施例による情報処理装置及びプログラムは、上記アルゴリズムの双方を単に組み合わせて使用するものではない。すなわち、多頻度アイテム集合抽出アルゴリズムを用いて得られた出力を当該アルゴリズムにより抽出した同時確率分布とし、これをそのまま制約ベースの学習アルゴリズムに入力として渡すだけ、といった単純な組み合わせで動作するものではない。既に述べたとおり、本発明は、制約ベースの学習アルゴリズムの内部に多頻度アイテム集合抽出アルゴリズムを組み込んだ独自のアルゴリズムを用いる。そして、本発明は、構造学習時に処理が必要になる同時確率変数値のパターンに限って多頻度アイテム集合抽出の対象となるように動作する。また、制約ベース学習アルゴリズムを用いる場合、条件付き相互情報量の条件として所与とされる確率変数の数は、アルゴリズムが進行するに従って単調に増加するとは限らない。一方、多頻度アイテム集合抽出アルゴリズムは、扱うアイテム(本発明においては、確率変数とその取る値の組)の数がアルゴリズムの進行に伴って単調に増加する。本発明は、扱う確率変数の数が局所的に単調増加を保つよう制約ベース学習アルゴリズムを変更することにより、多頻度アイテム集合抽出アルゴリズムの効果的な組み込みを実現するものである。
The information processing apparatus and program according to the present embodiment do not simply use a combination of the above algorithms. In other words, it does not operate with a simple combination in which the output obtained by using the frequent item set extraction algorithm is a joint probability distribution extracted by the algorithm, and this is simply passed as input to the constraint-based learning algorithm. . As described above, the present invention uses a unique algorithm in which a frequent item set extraction algorithm is incorporated inside a constraint-based learning algorithm. And this invention operate | moves so that it may become the object of frequent item set extraction only in the pattern of the joint random variable value which needs a process at the time of structure learning. When using a constraint-based learning algorithm, the number of random variables given as a condition for conditional mutual information does not always increase monotonically as the algorithm progresses. On the other hand, in the frequent item set extraction algorithm, the number of items to be handled (in the present invention, a set of random variables and their values) monotonously increases as the algorithm progresses. The present invention realizes effective incorporation of a frequent item set extraction algorithm by changing the constraint-based learning algorithm so that the number of random variables to be handled keeps monotonically increasing locally.
本実施例によるベイジアンネットワーク構造学習の高速化及び処理時間の安定化の効果を確認するために実験を行った。実験の基礎となるベイジアンネットワークとしては、ベイジアンネットワーク学習の例題として頻繁に用いられるベイジアンネットワークリポジトリ(http://compbio.cs.huji.ac.il/Repository)からAlarmと呼ばれる37ノード、46エッジのネットワークを使用した。実験に使用したベイジアンネットワークを図22に示す。当該ネットワークについてそれぞれ5000件、15000件、30000件及び50000件のデータを生成し、これらデータを入力として用いて従来のTPDAアルゴリズム及び本実施例のアルゴリズムに従ってコンピュータを本実施例の情報処理装置として動作させることにより、ベイジアンネットワーク構造学習を行った。
An experiment was conducted to confirm the effect of speeding up the Bayesian network structure learning and stabilizing the processing time according to this embodiment. The Bayesian network that is the basis of the experiment is a 37-node, 46-edge system called Alarm from the Bayesian network repository (http://compbio.cs.huji.ac.il/Repository) frequently used as an example of Bayesian network learning. Used the network. The Bayesian network used for the experiment is shown in FIG. Generate 5000, 15000, 30000, and 50000 data for the network, respectively, and use these data as inputs to operate the computer as an information processing device of the present embodiment according to the conventional TPDA algorithm and the algorithm of the present embodiment. By doing this, Bayesian network structure learning was performed.
上記のそれぞれのデータ件数について5データセットずつ実行したときの平均値等を表す実験結果を表1から表4に示す。Missing Edge及びExtra Edgeは、それぞれ、正しいベイジアンネットワークと比較した場合の、推定されたベイジアンネットワークにおいて失われたエッジの数及び追加された余分なエッジの数を示す。各表においては、さらに、従来のTPDAアルゴリズムを使用した場合の実行時間を1としたときの各アルゴリズムについての実行時間と標準偏差、及び各アルゴリズムについてのデータ件数5000件の場合の実行時間を1としたときの実行時間を示した。
Tables 1 to 4 show the experimental results representing the average values and the like when five data sets are executed for each of the above data numbers. Missing Edge and Extra Edge indicate the number of edges lost in the estimated Bayesian network and the number of extra edges added, respectively, when compared to the correct Bayesian network. In each table, the execution time and standard deviation for each algorithm when the execution time when using the conventional TPDA algorithm is set to 1, and the execution time when the number of data items for each algorithm is 5000 are 1 The execution time was shown.
実験結果から、本実施例により、従来のTPDAアルゴリズムを使用した場合と比較してベイジアンネットワーク構造学習を大幅に高速化できること、実行時間のばらつきを大幅に低減できることが分かる。また、Missing Edge及びExtra Edgeにより示される正しいネットワークからの誤差についても、従来技術と遜色のないレベルに抑えられていることが分かる。このように、本実施例は、従来技術と比較して、推定されるベイジアンネットワークの精度を犠牲にすることなく、構造学習の高速化及び実行時間の安定化を実現できるという優れた効果を奏する。
From the experimental results, it can be seen that, according to the present embodiment, Bayesian network structure learning can be greatly speeded up and the variation in execution time can be greatly reduced as compared with the case where the conventional TPDA algorithm is used. It can also be seen that the error from the correct network indicated by Missing Edge and Extra Edge is suppressed to a level comparable to that of the prior art. As described above, the present embodiment has an excellent effect that the speed of the structure learning and the stabilization of the execution time can be realized without sacrificing the accuracy of the estimated Bayesian network as compared with the conventional technique. .
続いて、切断集合の探索や切断集合の存在判定を行う際、条件変数集合の部分集合サイズが昇順となる順に条件付き相互情報量の計算を行いかつMDFの仮定を最大限早期に用いること、すなわち2つの条件変数を所与とした場合と当該条件変数集合中の単一の各条件変数を所与とした場合の条件付き相互情報量とを比較し、切断集合に含まれない変数を2段階目ですべて探索対象から削除することにより、最大でもわずか3つの段階からなる処理を可能とし、ベイジアンネットワーク構造学習の高速化を可能とする、本発明の別の実施例につき以下に詳細に説明する。
Subsequently, when searching for a cut set or determining the existence of a cut set, calculating the conditional mutual information in the order in which the subset size of the conditional variable set is in ascending order and using the MDF assumption as early as possible, That is, when two conditional variables are given and the conditional mutual information when given each single conditional variable in the conditional variable set is compared, variables not included in the cut set are compared with 2 In the following, another embodiment of the present invention will be described in detail, which allows processing consisting of at most three stages by deleting all from the search target at the stage, thereby enabling high-speed Bayesian network structure learning. To do.
本実施例の情報処理装置100の構成は図1に示されている。また、本実施例における情報処理装置100による基本的な処理は図3に示すとおりである。ベイジアンネットワーク構造学習を実行すべき旨の命令を受信すると、情報処理装置100は処理を開始する(ステップ302)。当該命令は、ベイジアンネットワーク構造学習の基礎となるデータを格納するデータベース106にアクセスするための接続情報及びデータ仕様記述ファイル名を含むように構成される。当該動作パラメータは、上記のほか、ベイジアンネットワーク構造学習において使用される相互情報量及び条件付き相互情報量の閾値ε(一例として、0.01)を含む。さらに、出力となるベイジアンネットワーク構造記述ファイルのファイル名を含んでもよい。
The configuration of the information processing apparatus 100 of the present embodiment is shown in FIG. Further, the basic processing by the information processing apparatus 100 in this embodiment is as shown in FIG. When receiving an instruction to execute Bayesian network structure learning, the information processing apparatus 100 starts processing (step 302). The instruction is configured to include connection information and a data specification description file name for accessing the database 106 that stores data serving as a basis for Bayesian network structure learning. In addition to the above, the operation parameters include a mutual information amount used in Bayesian network structure learning and a conditional mutual information threshold value ε (for example, 0.01). Furthermore, the file name of the output Bayesian network structure description file may be included.
情報処理装置100は初期処理を行う。制御部102は、上記動作パラメータが正常であるか否かをチェックして(ステップ304)、エラーがあれば処理を終了し(ステップ320)、正常であればデータ仕様解析部104にデータ仕様を解析させる(ステップ306)。データ仕様解析部104は、データ仕様記述ファイル108を読み取り、データに含まれる各確率変数の名前、確率変数の数、各確率変数が取り得るすべての状態の名前及び状態数を保持する。次に、データ仕様解析部104は、データベース接続情報を用いてデータベース106に接続し、全データの件数を取得してこれを保持する(ステップ308)。ステップ308の後、制御部102は制御を構造学習部110に移す。
The information processing apparatus 100 performs initial processing. The control unit 102 checks whether or not the operation parameter is normal (step 304). If there is an error, the control unit 102 terminates the processing (step 320). If normal, the data specification is sent to the data specification analysis unit 104. Analysis is performed (step 306). The data specification analysis unit 104 reads the data specification description file 108 and holds the names of the random variables, the number of random variables, the names of all the states that can be taken by the random variables, and the number of states. Next, the data specification analysis unit 104 connects to the database 106 using the database connection information, acquires the number of all data, and holds it (step 308). After step 308, the control unit 102 transfers control to the structure learning unit 110.
構造学習部110は木構造準備処理を行い、与えられたデータについて木構造を生成する(ステップ310)。ステップ310の処理を図23においてさらに詳細に示す。
The structure learning unit 110 performs a tree structure preparation process and generates a tree structure for the given data (step 310). The process of step 310 is shown in more detail in FIG.
クエリー管理部112は、すべての確率変数ペアについて相互情報量を計算する(ステップ2302)。構造学習部110は、相互情報量がε以上の場合、その確率変数ペアを、情報処理装置100内の記憶部(図示せず)等に格納される確率変数ペア配列に追加する。
The query management unit 112 calculates mutual information for all random variable pairs (step 2302). When the mutual information amount is ε or more, the structure learning unit 110 adds the random variable pair to a random variable pair array stored in a storage unit (not shown) in the information processing apparatus 100 or the like.
続いて、構造学習部110は、確率変数ペア配列内に格納された確率変数ペアを、相互情報量の大きい順にソートする(ステップ2304)。そして、相互情報量の大きい確率変数ペアの順に、当該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままであるか否かをグラフ構造構築部118に問い合わせる(ステップ2306)。グラフ構造構築部118は、エッジを追加すると閉路が発生する場合、木構造とならなくなる旨を構造学習部110に通知する(ステップ2306の「いいえ」)。一方、エッジを追加しても木構造が保たれる旨がグラフ構造構築部118から通知されると(ステップ2306の「はい」)、構造学習部110は、現在着目している確率変数ペア間に無向エッジを追加するようグラフ構造構築部118に指示し、確率変数ペア配列から当該確率変数ペアを削除する(ステップ2308)。確率変数ペア配列内のすべての確率変数ペアについてステップ2306及び2308が繰り返される(ステップ2310)。
Subsequently, the structure learning unit 110 sorts the random variable pairs stored in the random variable pair array in descending order of mutual information (step 2304). Then, the graph structure construction unit 118 is inquired as to whether or not the graph structure remains a tree structure even if an edge is added between the random variable pairs in the order of the random variable pairs having the larger mutual information amount (step 2306). The graph structure construction unit 118 notifies the structure learning unit 110 that a tree structure is not generated when an edge is added to the structure learning unit 110 (“No” in step 2306). On the other hand, when the graph structure construction unit 118 notifies that the tree structure is maintained even if an edge is added (“Yes” in step 2306), the structure learning unit 110 determines whether the current attention is placed between the random variable pairs. The graph structure construction unit 118 is instructed to add an undirected edge, and the random variable pair is deleted from the random variable pair array (step 2308). Steps 2306 and 2308 are repeated for all random variable pairs in the random variable pair array (step 2310).
上述のように、ステップ310の処理においては、相互情報量がε以上である確率変数ペアの各々について、その確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加するようにして、木構造のグラフ構造を生成する。生成されたグラフ構造は、グラフ構造構築部118、情報処理装置100内の記憶部(図示せず)、あるいは構造学習部110等に記憶されてもよい。
As described above, in the process of step 310, for each random variable pair whose mutual information amount is ε or more, even if an edge is added between the random variable pairs, the graph structure remains a tree structure. A tree-structured graph structure is generated by adding edges. The generated graph structure may be stored in the graph structure construction unit 118, a storage unit (not shown) in the information processing apparatus 100, the structure learning unit 110, or the like.
再び図3に戻り、構造学習部110は、ステップ312においてエッジ増加処理を実行する。構造学習部110は、相互情報量がε以上であるにもかかわらず、無向エッジを追加すると木構造にならなくなるために木構造準備処理において無向エッジが追加されなかった確率変数ペア(すなわち、確率変数ペア配列に残っている確率変数ペア)について、実際にエッジが必要であるか否かを条件付き相互情報量を用いることにより判定し、必要であると判定される場合には当該確率変数ペア間に無向エッジを追加する。このときのThickening(エッジ増加)ルーチンの一例を図24に示す。
Returning to FIG. 3 again, the structure learning unit 110 executes an edge increasing process in step 312. Although the mutual information amount is equal to or larger than ε, the structure learning unit 110 does not become a tree structure when an undirected edge is added, and thus the random variable pair in which the undirected edge is not added in the tree structure preparation process (ie, , Random variable pairs remaining in the random variable pair array), whether or not an edge is actually needed is determined by using conditional mutual information, and if it is determined to be necessary, the probability Add undirected edges between variable pairs. An example of the thickening (edge increase) routine at this time is shown in FIG.
本実施例においてThickeningルーチン内で実行される主要な処理の詳細を図25に示す。構造学習部110は、相互情報量がε以上であるが無向エッジを有していない各確率変数ペア(すなわち、確率変数ペア配列に残っている確率変数ペア)について、当該ペアを構成する2つの確率変数(例えば、X、Y)ノードの一方のノードを始点とし他方のノードを終点とするパス上に存在しそれら2つの確率変数ノードの何れかに隣接するノードの集合を最終的な条件集合(ConditionSet、本実施例においてはZ(太字)と表す)として設定する(ステップ2402)。また、当該最終的な条件集合と同じ確率変数集合を有する候補条件集合Zc(太字)を生成する。
FIG. 25 shows details of main processes executed in the thickening routine in this embodiment. The structure learning unit 110 configures the pair for each random variable pair whose mutual information amount is ε or more but does not have an undirected edge (that is, the random variable pair remaining in the random variable pair array). The final condition is a set of nodes that exist on a path that has one node of one random variable (for example, X, Y) as a start point and the other node as an end point and is adjacent to one of these two random variable nodes. It is set as a set (ConditionSet, expressed as Z (bold) in this embodiment) (step 2402). In addition, a candidate condition set Z c (bold) having the same random variable set as the final condition set is generated.
構造学習部110は、上記最終的な条件集合(例えば、{Z1,Z2,Z3,Z4,・・・})に含まれるある1つの確率変数(すなわち、XとYとの間のパス上に存在し、X又はYに隣接する確率変数のうちの1つ)について、クエリー管理部112に条件付き相互情報量を計算させる。
The structure learning unit 110 includes one random variable (ie, between X and Y) included in the final condition set (for example, {Z 1 , Z 2 , Z 3 , Z 4 ,...}). The query management unit 112 calculates the conditional mutual information for one of the random variables adjacent to X or Y) on the path of (1).
クエリー管理部112は、まず、最終的な条件集合Z(太字)に含まれる確率変数のうちある1つの確率変数(例えば、Z1)のみが条件集合に含まれるとした場合について、条件付き相互情報量I(X,Y|Z(太字))を計算する(ステップ2404)。計算された条件付き相互情報量は、その計算に使用された条件集合(ここではZ1のみを含む)と関連付けて、クエリー管理部112や情報処理装置100の記憶部(図示せず)などに格納されてもよい。
First, the query management unit 112 performs conditional mutual processing on the assumption that only one random variable (for example, Z 1 ) among random variables included in the final condition set Z (bold) is included in the condition set. The information amount I (X, Y | Z (bold)) is calculated (step 2404). The calculated conditional mutual information is associated with the condition set used in the calculation (including only Z 1 in this case) and stored in the query management unit 112 or the storage unit (not shown) of the information processing apparatus 100. It may be stored.
構造学習部110は、ステップ2404において計算された条件付き相互情報量につき、ε未満であるかを判定する(ステップ2406)。ε未満の場合(ステップ2406の「はい」)、このときの確率変数ペア({X,Y})と条件集合(例えば、{Z1})との組を切断集合保持部116内に記憶されるグローバルな切断集合に格納する。そして当該確率変数ペア間にエッジが不要であると判断する(ステップ2408)。
The structure learning unit 110 determines whether or not the conditional mutual information calculated in step 2404 is less than ε (step 2406). If it is less than ε (“Yes” in Step 2406), the pair of the random variable pair ({X, Y}) and the condition set (for example, {Z 1 }) at this time is stored in the cut set holding unit 116. Stored in a global disconnected set. Then, it is determined that no edge is required between the random variable pairs (step 2408).
一方、ステップ2404においてある1つの確率変数を条件集合として計算された条件付き相互情報量がε以上である場合(ステップの2406の「いいえ」)、処理はステップ2404に戻り、別の1つの確率変数(例えば、Z2)のみが条件集合に含まれる場合について条件付き相互情報量が計算される。以下同様にしてステップ2404及び2406が繰り返される。すなわち、最終的な条件集合Z(太字)に含まれるある1つの確率変数を所与とした条件付き相互情報量の計算が繰り返される。計算された条件付き相互情報量は、その計算に使用された条件集合と関連付けて、クエリー管理部112や情報処理装置100の記憶部(図示せず)などに格納されてもよい。この過程において条件付き相互情報量がε未満となる確率変数が見つかれば(ステップ2406の「はい」)、処理はステップ2408に移る。
On the other hand, if the conditional mutual information amount calculated using one random variable as a condition set in step 2404 is equal to or larger than ε (“No” in step 2406), the process returns to step 2404, and another probability is obtained. A conditional mutual information amount is calculated when only a variable (eg, Z 2 ) is included in the condition set. Thereafter, steps 2404 and 2406 are repeated in the same manner. That is, the calculation of the conditional mutual information with a given random variable included in the final condition set Z (bold) is repeated. The calculated mutual mutual information amount may be stored in the query management unit 112 or the storage unit (not shown) of the information processing apparatus 100 in association with the condition set used for the calculation. If a random variable whose conditional mutual information is less than ε is found in this process (“Yes” in step 2406), the process proceeds to step 2408.
一方、ステップ2404及び2406の実行により計算されたいずれの条件付き相互情報量もε以上であった場合、処理はステップ2410に移る。クエリー管理部112は、最終的な条件集合Z(太字)に含まれる確率変数のうちある2つの確率変数(例えば、Z1及びZ2)が条件集合に含まれるとした場合について、条件付き相互情報量I(X,Y|Z(太字))を計算する(ステップ2410)。計算された条件付き相互情報量は、その計算に使用された条件集合(ここでは{Z1,Z2})と関連付けて、クエリー管理部112や情報処理承知100の記憶部(図示せず)などに格納されてもよい。
On the other hand, if any conditional mutual information calculated by the execution of steps 2404 and 2406 is greater than or equal to ε, the process proceeds to step 2410. The query management unit 112 performs conditional mutual processing for a case where two random variables (for example, Z 1 and Z 2 ) among random variables included in the final condition set Z (bold) are included in the condition set. The information amount I (X, Y | Z (bold)) is calculated (step 2410). The calculated conditional mutual information is associated with the condition set used in the calculation (here, {Z 1 , Z 2 }), and the storage unit (not shown) of the query management unit 112 or the information processing knowledge 100. Or the like.
構造学習部110は、ステップ2410において計算された条件付き相互情報量につき、ε未満であるかを判定する(ステップ2412)。これがε未満である場合(ステップ2412の「はい」)、このときの確率変数ペア({X,Y})と条件集合(例えば、{Z1,Z2})との組を切断集合保持部116内のグローバルな切断集合に格納する。そして、当該確率変数ペア(ここでは、X及びY)間にエッジが不要であると判断される(ステップ2408)。
The structure learning unit 110 determines whether the conditional mutual information calculated in step 2410 is less than ε (step 2412). When this is less than ε (“Yes” in step 2412), the set of the random variable pair ({X, Y}) and the condition set (for example, {Z 1 , Z 2 }) at this time is cut into a cut set holding unit. Store in the global cutting set in 116. Then, it is determined that no edge is required between the random variable pair (here, X and Y) (step 2408).
一方、ステップ2410において計算された条件付き相互情報量がε以上である場合(ステップの2412の「いいえ」)、構造学習部110は、当該条件付き相互情報量が、ステップ2410の計算において使用された条件集合に含まれる2つの確率変数のうちの一方(例えば、Z1)を条件集合としてステップ2404において既に計算された条件付き相互情報量よりも大きいか否かを判定する(ステップ2414)。大きい場合(ステップ2414の「はい」)、構造学習部110は、記憶部(図示せず)などに格納されている候補条件集合Zc(太字)から当該一方の確率変数(ここでは、Z1)を削除する。構造学習部110は、ステップ2410の計算において使用された条件集合に含まれる2つの確率変数のうちの他方(例えば、Z2)についても同様の処理を行う(ステップ2418及び2420)。
On the other hand, when the conditional mutual information amount calculated in step 2410 is equal to or larger than ε (“No” in step 2412), the structure learning unit 110 uses the conditional mutual information amount in the calculation of step 2410. It is determined whether one of the two random variables included in the condition set (for example, Z 1 ) is larger than the conditional mutual information already calculated in step 2404 using the condition set (step 2414). If larger (“Yes” in step 2414), the structure learning unit 110 selects one of the random variables (in this case, Z 1 ) from the candidate condition set Z c (bold) stored in the storage unit (not shown) or the like. ) Is deleted. The structure learning unit 110 performs the same process on the other (for example, Z 2 ) of the two random variables included in the condition set used in the calculation of Step 2410 (Steps 2418 and 2420).
その後、構造学習部110は、候補条件集合Zc(太字)に残るすべての確率変数につき、ステップ2410~2420を繰り返す(ステップ2422)。例えば、当初の候補条件集合Zc(太字)にZ1~Z6の6つの確率変数が含まれていた場合において、ステップ2414~2420の処理を通じて候補条件集合Zc(太字)からZ1及びZ2が削除された場合には、残されたZ3~Z6についてステップ2410~2420の処理が再び実行される。
Thereafter, the structure learning unit 110 repeats Steps 2410 to 2420 for all random variables remaining in the candidate condition set Z c (bold) (Step 2422). For example, when the initial candidate condition set Z c (bold) includes six random variables Z 1 to Z 6 , the candidate condition set Z c (bold) to Z 1 and When Z 2 is deleted, the processes of steps 2410 to 2420 are executed again for the remaining Z 3 to Z 6 .
続いて、クエリー管理部112は、上述の処理の結果として候補条件集合Zc(太字)に残ったすべての確率変数を条件集合として条件付き相互情報量を計算する(ステップ2424)。構造学習部110は、ステップ2424において計算された条件付き相互情報量がε未満であるかを判定する(ステップ2426)。これがε未満である場合(ステップ2426の「はい」)、このときの確率変数ペア({X,Y})と候補条件集合との組を切断集合保持部116内のグローバルな切断集合に格納する。そして、当該確率変数ペア(ここでは、X及びY)間にエッジが不要であると判断される(ステップ2408)。一方、ステップ2424において計算された条件付き相互情報量がε未満でない場合(ステップ2426の「いいえ」)、構造学習部110は、当該確率変数ペア間にエッジが必要であると判断し、グラフ構造構築部118にその旨の指示を行う(ステップ2428)。このようにして、エッジが必要であると判断された場合に確率変数ペア間にエッジが追加される。生成されたグラフ構造は、グラフ構造構築部118、情報処理装置100内の記憶部(図示せず)、あるいは構造学習部110等に記憶されてもよい。
Subsequently, the query management unit 112 calculates a conditional mutual information amount using all the random variables remaining in the candidate condition set Z c (bold) as a result of the above-described processing (step 2424). The structure learning unit 110 determines whether the conditional mutual information calculated in step 2424 is less than ε (step 2426). When this is less than ε (“Yes” in step 2426), the pair of the random variable pair ({X, Y}) and the candidate condition set at this time is stored in the global cut set in the cut set holding unit 116. . Then, it is determined that no edge is required between the random variable pair (here, X and Y) (step 2408). On the other hand, if the conditional mutual information calculated in step 2424 is not less than ε (“No” in step 2426), the structure learning unit 110 determines that an edge is necessary between the random variable pair, and the graph structure The construction unit 118 is instructed to that effect (step 2428). In this way, an edge is added between a pair of random variables when it is determined that an edge is necessary. The generated graph structure may be stored in the graph structure construction unit 118, a storage unit (not shown) in the information processing apparatus 100, the structure learning unit 110, or the like.
図25に示すように、本実施例は、切断集合の探索や切断集合の存在判定に際して、最大でも、条件変数集合のサイズが1である場合の処理(第1段階、ステップ2404、2406及び2408)、条件変数集合のサイズが2である場合の処理(第2段階、ステップ2410~2422及び2408)、及び第1段階及び第2段階の後に候補条件集合に残されたすべての確率変数に対する処理(第3段階、ステップ2424~2428及び2408)の3つの段階のみを必要とする。これに対して、従来のTPDAは、与えられた条件変数集合の部分集合サイズが大きい方から(すなわち、降順に)切断集合を探索し、結果として最大でN-2(Nは確率変数の数)段階の探索を必要とする。したがって、多数の確率変数が存在する状況下でのベイジアンネットワーク構造学習に際して、本実施例は従来のTPDAと比較して、処理を格段に高速化することができる。
As shown in FIG. 25, in the present embodiment, when searching for a cut set or determining the presence of a cut set, the processing when the size of the condition variable set is 1 at the maximum (first stage, steps 2404, 2406 and 2408). ), Processing when the size of the conditional variable set is 2 (second stage, steps 2410 to 2422 and 2408), and processing for all random variables left in the candidate condition set after the first stage and the second stage Only three stages (third stage, steps 2424-2428 and 2408) are required. On the other hand, the conventional TPDA searches for a cut set starting from the larger subset size of a given condition variable set (that is, in descending order), resulting in a maximum of N−2 (N is the number of random variables). ) Requires a stage search. Therefore, in the Bayesian network structure learning in a situation where there are a large number of random variables, this embodiment can significantly speed up the processing compared to the conventional TPDA.
図24のThickeningルーチンにおいて呼び出され、図25の処理に用いられるedgeNeeded_Hルーチンの一例を図26に示し、edgeNeeded_Hルーチンにおいて呼び出されるSearchCutSetルーチンの一例を図27に示す。図27のルーチンの詳細については既に図25に関連して説明した。
FIG. 26 shows an example of the edgeNeeded_H routine called in the thickening routine of FIG. 24 and used for the processing of FIG. 25, and FIG. 27 shows an example of the SearchCutSet routine called in the edgeNeeded_H routine. Details of the routine of FIG. 27 have already been described in relation to FIG.
図3に戻り、情報処理装置100は、ステップ314においてエッジ削減処理を行う。詳細な処理を図28に示す。
Returning to FIG. 3, the information processing apparatus 100 performs edge reduction processing in step 314. Detailed processing is shown in FIG.
構造学習部110は、無向エッジを有する各確率変数ペアについて、当該確率変数ペア間に別のパスが存在するかをグラフ構造構築部118に問い合わせる(ステップ2702)。別のパスがない旨の通知をグラフ構造構築部118から受けると(ステップ2702の「いいえ」)、別の確率変数ペアについてのステップ2702の処理に移る。別のパスがある旨の通知をグラフ構造構築部118から受けると(ステップ2702の「はい」)、構造学習部110は、当該確率変数ペア間の無向エッジを一時的に削除するようグラフ構造構築部118に指示する(ステップ2704)。そして、当該確率変数ペアについて図25に示した処理を実行し、この確率変数ペア間にエッジが必要であるか否かを判断する(ステップ2706)。既に述べたように、本実施例においてステップ2706の処理は最大でもわずか3段階で実行することができるため、ここにおいても、従来のTPDAと比較して処理の高速化が可能である。エッジが必要である場合(ステップ2706の「はい」)、構造学習部110は、一時的に削除した無向エッジを当該確率変数ペア間に再び追加するようグラフ構造構築部118に指示する(ステップ2708)。エッジが不要である場合(ステップ2706の「いいえ」)、構造学習部110はそのような指示を行わず、当該無向エッジは削除されたままとなる。エッジ増加処理312を終えた時点で無向エッジを有しているすべての確率変数ペアについてステップ2702から2708の処理を繰り返す(ステップ2710)。
The structure learning unit 110 inquires of the graph structure construction unit 118 whether there is another path between the random variable pairs for each random variable pair having an undirected edge (step 2702). When a notification that there is no other path is received from the graph structure construction unit 118 (“No” in step 2702), the processing proceeds to step 2702 for another random variable pair. When a notification that there is another path is received from the graph structure construction unit 118 (“Yes” in Step 2702), the structure learning unit 110 temporarily deletes the undirected edge between the random variable pairs. The construction unit 118 is instructed (step 2704). Then, the process shown in FIG. 25 is executed for the random variable pair, and it is determined whether or not an edge is necessary between the random variable pairs (step 2706). As described above, in the present embodiment, the processing in step 2706 can be executed in only three stages at the maximum, so that the processing speed can also be increased here as compared with the conventional TPDA. If an edge is necessary (“Yes” in step 2706), the structure learning unit 110 instructs the graph structure construction unit 118 to add the undirected edge temporarily deleted again between the random variable pairs (step). 2708). When the edge is unnecessary (“No” in Step 2706), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. Steps 2702 to 2708 are repeated for all random variable pairs having undirected edges when the edge increasing process 312 is completed (step 2710).
続いて、構造学習部110は、現時点で無向エッジを有している各確率変数ペアについて、当該ペアを構成する少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有するかについてグラフ構造構築部118に問い合わせる(ステップ2712)。ペアを構成するいずれの確率変数ノードもペアの相手以外に3つ以上の隣接ノードを有しない旨の通知をグラフ構造構築部118から受けると(ステップ2712の「いいえ」)、構造学習部110は別の確率変数ペアについてのステップ2712の処理に移る。ペアを構成する少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する旨の通知をグラフ構造構築部118から受けると(ステップ2712の「はい」)、構造学習部110は、当該確率変数ペア間の無向エッジを一時的に削除するようグラフ構造構築部に指示する(ステップ2714)。そして、当該確率変数ペアを構成する2つの確率変数ノード間のパス上にある、それら2つの確率変数ノードのいずれかに隣接するノードと、さらに当該隣接ノードに隣接するノードとを含む集合を、最終的な条件集合に設定する(ステップ2716)。そして、当該確率変数ペア及び最終的な条件集合について、図25のステップ2404以降の処理が実行され、この確率変数ペア間にエッジが必要であるか否かが判断される(ステップ2718)。既に述べたように、本実施例によれば、この処理においても高速化が期待できる。エッジが必要である場合(ステップ2718の「はい」)、構造学習部110は、一時的に削除した無向エッジを当該確率変数ペア間に再び追加するようグラフ構造構築部118に指示する(ステップ2720)。エッジが不要である場合(ステップ2718の「いいえ」)、構造学習部110はそのような指示を行わず、当該無向エッジは削除されたままとなる。ステップ2702から2710の処理を終えた時点で無向エッジを有しているすべての確率変数ペアについてステップ2712から2720の処理を繰り返す(ステップ2722)。
Subsequently, the structure learning unit 110 has, for each random variable pair having an undirected edge at the present time, at least one random variable node constituting the pair has three or more adjacent nodes other than the other node. The graph structure construction unit 118 is inquired about this (step 2712). Upon receiving notification from the graph structure construction unit 118 that none of the random variable nodes constituting the pair has three or more adjacent nodes other than the counterpart of the pair (“No” in step 2712), the structure learning unit 110 The process proceeds to step 2712 for another random variable pair. When the notification that the at least one random variable node constituting the pair has three or more adjacent nodes other than the other node is received from the graph structure construction unit 118 (“Yes” in step 2712), the structure learning unit 110 Then, the graph structure construction unit is instructed to temporarily delete the undirected edge between the random variable pairs (step 2714). Then, a set including a node adjacent to one of the two random variable nodes and a node adjacent to the adjacent node on the path between the two random variable nodes constituting the random variable pair, The final condition set is set (step 2716). Then, the processing after step 2404 in FIG. 25 is executed for the random variable pair and the final condition set, and it is determined whether or not an edge is necessary between the random variable pairs (step 2718). As already described, according to the present embodiment, high speed can be expected even in this processing. If an edge is necessary (“Yes” in step 2718), the structure learning unit 110 instructs the graph structure building unit 118 to add the undirected edge temporarily deleted again between the random variable pairs (step). 2720). If the edge is not necessary (“No” in step 2718), the structure learning unit 110 does not give such an instruction, and the undirected edge remains deleted. When the processing of steps 2702 to 2710 is completed, the processing of steps 2712 to 2720 is repeated for all random variable pairs having undirected edges (step 2722).
上述のように、ステップ314においては、ステップ312による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、必要である場合にエッジを追加する。その際、無向エッジを有する確率変数ペアについて、該無向エッジ以外のパスが存在する場合、該無向エッジを一時的に削除し、該確率変数ペアについて、ステップ312と同様にして、必要であると判断される場合に上記一時的に削除した無向エッジを再び追加する。さらに、無向エッジを有する確率変数ペアの少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する場合、該無向エッジを一時的に削除する。そして、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノード及び該隣接するノードにさらに隣接するノードを含む集合を最終的な条件集合として、ステップ312の場合と同様にして、必要であると判断される場合に上記一時的に削除した無向エッジを再び追加している。
As described above, in step 314, it is determined whether or not an edge is necessary for each random variable pair having an edge after the processing in step 312, and an edge is added if necessary. At that time, if there is a path other than the undirected edge for a random variable pair having an undirected edge, the undirected edge is temporarily deleted, and the random variable pair is necessary in the same manner as in step 312. If it is determined that the undirected edge is temporarily deleted, the undirected edge is added again. Furthermore, when at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted. Then, with respect to two random variable nodes constituting the random variable pair, a set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on the path between the two final random variable nodes is finalized. As in the case of step 312, the temporarily deleted undirected edge is added again as a necessary condition set when it is determined to be necessary.
本実施例におけるThinning(エッジ削減)ルーチンの一例を図29に示す。また、本実施例において当該ルーチンで呼び出されるedgeNeededルーチンの一例を図30に示す。
FIG. 29 shows an example of a thinning (edge reduction) routine in this embodiment. FIG. 30 shows an example of the edgeNeeded routine that is called in this routine in this embodiment.
図3に戻り、情報処理装置100は、エッジ削減処理314までの作業によりグラフ構造構築部118において生成されたグラフ構造に含まれる無向エッジにつき、その方向付け処理を実行する(ステップ316)。詳細な処理については図20に関連して既に説明されており、エッジの方向付けに用いられるルーチンの一例を図31に示す。
Returning to FIG. 3, the information processing apparatus 100 executes an orientation process for the undirected edge included in the graph structure generated by the graph structure construction unit 118 through the operations up to the edge reduction process 314 (step 316). Detailed processing has already been described with reference to FIG. 20, and FIG. 31 shows an example of a routine used for edge orientation.
図3のステップ316の実行後にグラフ構造構築部118が保持している(又は、情報処理装置100内の記憶部(図示せず)等に格納されている)グラフ構造は、本実施例の一連の処理を経て学習されたベイジアンネットワーク構造を表している。情報処理装置100は、これをベイジアンネットワーク構造記述ファイル120として出力し(ステップ318)、処理を完了する(ステップ320)。
The graph structure held by the graph structure construction unit 118 after execution of step 316 in FIG. 3 (or stored in a storage unit (not shown) or the like in the information processing apparatus 100) is a series of this embodiment. This represents a Bayesian network structure learned through the process. The information processing apparatus 100 outputs this as the Bayesian network structure description file 120 (step 318) and completes the process (step 320).
本実施例に関して説明された情報処理装置100は図1に示す複数の構成要素からなるが、このような構成は一例にすぎない。すなわち、本実施例の情報処理装置は、制御部102、データ仕様解析部104、構造学習部110、クエリー管理部112、クエリー結果キャッシュ部114、切断集合保持部116及びグラフ構造構築部118のうち複数の機能を単一の構成要素において実行するように構成されてもよい。また、これらの機能すべてを単一の構成要素(例えば、コンピュータのプロセッサ)において実行してもよい。情報処理装置100は、データベース106を含んでもよい。
The information processing apparatus 100 described in connection with the present embodiment includes a plurality of components shown in FIG. 1, but such a configuration is merely an example. That is, the information processing apparatus according to the present embodiment includes a control unit 102, a data specification analysis unit 104, a structure learning unit 110, a query management unit 112, a query result cache unit 114, a disconnected set holding unit 116, and a graph structure construction unit 118. Multiple functions may be configured to perform on a single component. Also, all of these functions may be performed on a single component (eg, a computer processor). The information processing apparatus 100 may include a database 106.
本実施例による計算量の低減及び処理の高速化について考察する。まず、本実施例の手法による、切断集合探索テストにおける最大計算量につき検討する。切断集合探索テストの第1段階(図25におけるステップ2404及び2406に相当)では、条件変数集合サイズは1であるため、CIテスト(条件付き相互情報量がε未満となるか否かの判定)において扱われる変数の数はX及びYを含めて3である。したがって、第1段階におけるCIテストのパターン数は、rを変数の状態数の最大値とするとr3である。同様に、第2段階(図25におけるステップ2410~2422に相当)においては、パターン数はr4となる。第3段階(図25におけるステップ2424~2428に相当)においては、計算量が最大となる場合、CIテストですべての変数を扱うことになるため、パターン数はrNである。各段階のCIテストの回数とパターン数を乗じてその和を取ることにより、本実施例において、最大計算量はO(rN+N2r4)となる。
Consider the reduction in the amount of calculation and the speeding up of processing according to this embodiment. First, the maximum amount of calculation in the cutting set search test using the method of this embodiment will be considered. In the first stage of the cut set search test (corresponding to steps 2404 and 2406 in FIG. 25), the condition variable set size is 1, so the CI test (determining whether the conditional mutual information amount is less than ε). The number of variables handled in is 3 including X and Y. Therefore, the number of CI test patterns in the first stage is r 3 where r is the maximum number of variable states. Similarly, in the second stage (corresponding to steps 2410 to 2422 in FIG. 25), the number of patterns becomes r 4. In the third stage (corresponding to steps 2424 to 2428 in FIG. 25), when the amount of calculation is maximized, all variables are handled in the CI test, so the number of patterns is r N. By multiplying the number of CI tests at each stage and the number of patterns and taking the sum, in this embodiment, the maximum calculation amount is O (r N + N 2 r 4 ).
次に、従来のTPDAによる、切断集合探索テストにおける最大計算量につき検討する。Z(太字)を所与としたXとYの切断集合探索テストを行うこと、すなわち、Z(太字)’⊆Z(太字)を列挙して、それぞれについてCIテストを行うことを考える。変数が取りうる状態の最大値をrとすると、条件変数集合サイズ|Z(太字)’|は、最大でN-2であり、CIテストのパターン数はrNである.TPDAでは、条件変数集合のサイズは第1段階目の|Z(太字)’|=N-2から1ずつ減少し、最後のN-2段階目において、条件変数集合サイズは|Z(太字)’|=1となる。したがって、TPDAのCIテストのパターン数は、切断集合探索テストの第1、2、・・・N-2段階において、それぞれ、rN、rN-1、・・・r3となる。よって、TPDAによる最大計算量は、各段階のCIテスト数にパターン数を乗じてその和を取ることにより、
となる。
Next, the maximum amount of calculation in the cut set search test by the conventional TPDA will be discussed. Consider performing a cut set search test of X and Y given Z (bold), that is, enumerate Z (bold) '⊆Z (bold) and perform a CI test for each. When the maximum value of the state variable can take the r, the condition variable set size | Z (bold) '| is maximum at an N-2, the number of patterns of the CI test is r N. In TPDA, the size of the conditional variable set is decreased by 1 from | Z (bold) ′ | = N−2 in the first stage, and the conditional variable set size is | Z (bold) in the last N−2 stage. '| = 1. Therefore, the number of CI test patterns of TPDA is r N , r N−1 ,..., R 3 in the first , second,. Therefore, the maximum calculation amount by TPDA is obtained by multiplying the number of CI tests at each stage by the number of patterns and taking the sum.
It becomes.
本実施例による最大計算量O(rN+N2r4)とTPDAによる上記式(9)で表される最大計算量とを比較すると、その差分はそれぞれの計算量から両手法に共通な項であるrNを除くことにより分析できる。すなわち、当該共通項を除くと、TPDAにおいては
となり、本実施例ではO(N2r4)となる。このように、TPDAにおいて計算量の差分O(rN-1)はNについて指数オーダーであるのに対し、本実施例ではN及びrのいずれについても多項式オーダーである。したがって、本実施例によれば、TPDAの場合と比較して切断集合探索テストにおける最大計算量を低減できることが確認された。
When the maximum calculation amount O (r N + N 2 r 4 ) according to this embodiment is compared with the maximum calculation amount represented by the above equation (9) by TPDA, the difference is a term common to both methods from the respective calculation amounts. It can be analyzed by removing the r N is. That is, excluding the common term, in TPDA
In this embodiment, O (N 2 r 4 ). As described above, in the TPDA, the difference O (r N−1 ) in the calculation amount is an exponent order for N, whereas in this embodiment, both N and r are polynomial orders. Therefore, according to the present Example, it was confirmed that the maximum calculation amount in the cut set search test can be reduced as compared with the case of TPDA.
一方、本実施例において、CIテストごとの最小計算量はO(r3)である。なぜなら、本実施例によれば、第1段階の最初のCIテストにおいて切断集合が発見されれば、第2段階及び第3段階の処理は不要であり、当該CIテストでr3のパターンを計算すれば良いからである。
On the other hand, in this embodiment, the minimum calculation amount for each CI test is O (r 3 ). This is because according to the present embodiment, if a cut set is found in the first CI test of the first stage, the processing of the second stage and the third stage is unnecessary, and the pattern of r 3 is calculated by the CI test. This is because it only has to be done.
これに対し、TPDAでは、変数XとYの切断集合を条件変数集合Z(太字)から探索する際、条件変数集合の部分集合Z(太字)’⊆Z(太字)を部分集合のサイズの降順に列挙して、着目する部分集合が切断集合であるか否かをCIテストにより調べる。CIテストの計算量は所与とする変数集合サイズの指数オーダーであるから、変数の取る状態数の最大値をr、所与とする変数集合サイズ|Z(太字)’|を最大の場合のN-2とすると、当該計算量はO(rN)となる。したがって、TPDAによれば、切断集合探索テストごとの最小計算量はO(rN)である。したがって、本実施例によれば、切断集合探索テストにおける最小計算量に関しても、確率変数の数が増大するほど、TPDAと比較して計算量を低減できることが確認された。
On the other hand, in TPDA, when searching for a cut set of variables X and Y from a conditional variable set Z (bold), a subset Z (bold) '⊆ Z (bold) of the conditional variable set is descending in the size of the subset. The CI test is performed to check whether the target subset is a cut set. Since the calculation amount of the CI test is an exponential order of a given variable set size, the maximum value of the number of states taken by the variable is r, and the given variable set size | Z (bold) '| Assuming N−2, the calculation amount is O (r N ). Therefore, according to TPDA, the minimum calculation amount for each cut set search test is O (r N ). Therefore, according to the present example, it was confirmed that the amount of calculation can be reduced as compared with TPDA as the number of random variables increases with respect to the minimum amount of calculation in the cut set search test.
以上のように、本実施例による手法は、TPDAと比較して最大計算量及び最小計算量の双方を低減できるという顕著な効果を奏することが理解される。
As described above, it is understood that the method according to the present embodiment has a remarkable effect that both the maximum calculation amount and the minimum calculation amount can be reduced as compared with TPDA.
本実施例において、本発明は情報処理装置100として実施されるものとして説明された。しかし、コンピュータを図1に示す構成要素の一部または全部として動作させるプログラムとして本発明を実現することができることは当業者にとって明らかであろう。また、図3に記載のステップの一部又は全部をコンピュータに実行させるプログラムとして本発明を実現し得ることも当業者にとって明らかであろう。
In the present embodiment, the present invention has been described as being implemented as the information processing apparatus 100. However, it will be apparent to those skilled in the art that the present invention can be implemented as a program that causes a computer to operate as some or all of the components shown in FIG. It will also be apparent to those skilled in the art that the present invention can be implemented as a program that causes a computer to execute some or all of the steps described in FIG.
本実施例は、図25及び図27に関連して説明されたように、切断集合の探索や切断集合の存在の判定に際して、条件変数集合のサイズが1である場合の処理、条件変数集合のサイズが2である場合の処理、及びこれらの処理の後に候補条件集合に残されたすべての確率変数に対する処理、という最大でも3つの段階のみからなる手続きを実行することにより、多数の確率変数が存在する状況下でのベイジアンネットワーク構造学習における切断集合の探索を高速化することができる。この結果、従来のTPDAと比較してさらに高速なベイジアンネットワーク構造学習が可能となる。
In this embodiment, as described in connection with FIGS. 25 and 27, when searching for a cut set or determining the presence of a cut set, the processing when the size of the condition variable set is 1, By executing a procedure consisting of at most three stages of processing when the size is 2 and processing for all the random variables left in the candidate condition set after these processing, a large number of random variables are obtained. It is possible to speed up the search for cut sets in Bayesian network structure learning under existing conditions. As a result, Bayesian network structure learning can be performed at higher speed than conventional TPDA.
本実施例によるベイジアンネットワーク構造学習の高速化及び処理時間の安定化の効果を確認するために実験を行った。実験に際して、アルゴリズムをJava(登録商標)実験に用いた環境は、Intel Core 2プロセッサ2.67GHzの4GB RAM上で動作するWindows(登録商標) Vista Business SP2であり、Java(登録商標)仮想マシンには最大512MBのメモリを割り当てた。
An experiment was conducted to confirm the effect of speeding up the Bayesian network structure learning and stabilizing the processing time according to this embodiment. During the experiment, the environment in which the algorithm was used for the Java (registered trademark) experiment was Windows (registered trademark) Vista Business SP2 running on 4 GB RAM of Intel Core 2 processor 2.67 GHz, and in the Java (registered trademark) virtual machine. Allocated a maximum of 512 MB of memory.
実施例1と同様、図22に示すベイジアンネットワークを実験の基礎として使用した。当該ネットワークについてそれぞれ5000件、15000件、30000件及び50000件のデータを生成し、これらデータを入力として用いて従来のTPDAアルゴリズム及び本実施例のアルゴリズムに従ってコンピュータを本実施例の情報処理装置として動作させることにより、ベイジアンネットワーク構造学習を行った。
As in Example 1, the Bayesian network shown in FIG. 22 was used as the basis of the experiment. Generate 5000, 15000, 30000, and 50000 data for the network, respectively, and use these data as inputs to operate the computer as an information processing device of the present embodiment according to the conventional TPDA algorithm and the algorithm of the present embodiment. By doing this, Bayesian network structure learning was performed.
上記のそれぞれのデータ件数について10データセットずつ実行したときの平均値等を表す実験結果を表1から表4に示す。従来のTPDAを使用した場合の結果を「TPDA」の行に示し、本実施例による結果を「TS(Three-Staged)-TPDA」の行に示した。Missing Edge及びExtra Edgeは、それぞれ、正しいベイジアンネットワークと比較した場合の、推定されたベイジアンネットワークにおいて失われたエッジの数及び追加された余分なエッジの数を示す。
Tables 1 to 4 show the experimental results showing the average values when 10 data sets are executed for each of the above data numbers. The result of using the conventional TPDA is shown in the “TPDA” line, and the result of this example is shown in the “TS (Three-Staged) -TPDA” line. Missing Edge and Extra Edge indicate the number of edges lost in the estimated Bayesian network and the number of extra edges added, respectively, when compared to the correct Bayesian network.
実験結果から、本実施例により、従来のTPDAアルゴリズムを使用した場合と比較してベイジアンネットワーク構造学習を極めて大幅に高速化できること、実行時間のばらつきを極めて大幅に低減できることが分かる。また、Missing Edge及びExtra Edgeにより示される正しいネットワークからの誤差についても、従来技術と遜色のないレベルに抑えられていることが分かる。このように、本実施例は、従来技術と比較して、推定されるベイジアンネットワークの精度を犠牲にすることなく、構造学習の大幅な高速化及び実行時間の安定化を実現できるという優れた効果を奏する。
From the experimental results, it can be seen that, according to the present embodiment, Bayesian network structure learning can be significantly speeded up and the variation in execution time can be greatly reduced compared to the case where the conventional TPDA algorithm is used. It can also be seen that the error from the correct network indicated by Missing Edge and Extra Edge is suppressed to a level comparable to that of the prior art. As described above, the present embodiment has an excellent effect that the structure learning can be significantly speeded up and the execution time can be stabilized without sacrificing the accuracy of the estimated Bayesian network as compared with the prior art. Play.
Claims (18)
- 複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置であって、
(A)前記入力データについて木構造のグラフを生成する手段であって、相互情報量が第1の閾値以上である確率変数ペアの各々について、該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加する、木構造のグラフを生成する手段と、
(B)相互情報量が前記第1の閾値以上でありながら前記手段(A)によってエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にエッジを追加する手段であって、
該エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値が前記第1の閾値未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加し、
前記条件付き相互情報量の計算において、該2つの確率変数の状態とそれぞれの確率変数に対応する状態集合とについての同時確率分布が前記第1の閾値以下の第2の閾値未満となる場合には、関連する成分の計算を省略する、エッジを追加する手段と
を具備する情報処理装置。 An information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes,
(A) A means for generating a tree-structured graph for the input data, the graph even if an edge is added between the random variable pairs for each of the random variable pairs whose mutual information amount is equal to or greater than the first threshold value. Means for generating a graph of the tree structure, adding edges if the structure remains a tree structure;
(B) A means for adding an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold and the edge is not added by the means (A),
For two random variable nodes constituting a random variable pair to which the edge has not been added, a set of random variables included in a set of nodes on a path between them and adjacent to one of the two random variable nodes If conditional mutual information is calculated as a set and a pair whose value is less than the first threshold is found, no edge is added between the two random variables, otherwise an edge is added And
In the calculation of the conditional mutual information, when the joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is less than a second threshold equal to or less than the first threshold Is an information processing apparatus comprising means for adding an edge, omitting calculation of related components. - 前記手段(A)は、
前記入力データに含まれる各確率変数の周辺確率を計算し、
確率変数ペアを構成するいずれかの確率変数がある状態を取る周辺確率が前記第2の閾値未満である場合には、関連する相互情報量の成分の計算を省略することにより、各確率変数ペアの相互情報量を計算し、
前記手段(B)は、
前記エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合Cについての条件付き相互情報量を、条件集合Cのサイズが1である場合から前記最終的な条件集合となる場合までサイズを大きくしながら計算し、その過程で前記第1の閾値未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加し、前記情報処理装置はさらに、
(C)前記手段(B)による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、不要である場合にエッジを削除する手段と、
(D)各エッジの方向付けを行う手段と
を具備する請求項1に記載の情報処理装置。 The means (A)
Calculating marginal probabilities for each random variable included in the input data;
When the marginal probability that takes a state in which one of the random variables constituting the random variable pair is less than the second threshold value, the calculation of the component of the related mutual information amount is omitted, whereby each random variable pair The mutual information of
The means (B)
For the two random variable nodes constituting the random variable pair to which the edge has not been added, a set of nodes on the path between them and adjacent to one of the two random variable nodes is defined as a final condition set. The conditional mutual information about the two random variable nodes and the condition set C is calculated while increasing the size from when the size of the condition set C is 1 to when it becomes the final condition set. If a conditional mutual information amount that is less than the first threshold is obtained, an edge is not added between the two random variables; otherwise, an edge is added if deemed necessary, and the information The processing device further
(C) means for determining whether an edge is necessary for each random variable pair having an edge after processing by the means (B), and deleting the edge if unnecessary;
The information processing apparatus according to claim 1, further comprising: (D) means for directing each edge. - 前記手段(C)は、
無向エッジを有する確率変数ペアについて、該無向エッジ以外のパスが存在する場合、該無向エッジを一時的に削除し、
該確率変数ペアについて、前記手段(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加し、
無向エッジを有する確率変数ペアの少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する場合、該無向エッジを一時的に削除し、
該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノード及び該隣接するノードにさらに隣接するノードを含む集合を最終的な条件集合として、前記手段(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加する請求項2に記載の情報処理装置。 The means (C)
For a random variable pair having an undirected edge, if there is a path other than the undirected edge, temporarily delete the undirected edge,
For the random variable pair, in the same manner as the means (B), add the undirected edge temporarily deleted when necessary,
If at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted;
With respect to two random variable nodes constituting the random variable pair, a final set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on a path between the two random variable nodes The information processing apparatus according to claim 2, wherein the undirected edge temporarily deleted is added as a condition set in the same manner as the means (B) when necessary. - 制約ベース学習アルゴリズムを使用して、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置であって、
ある確率変数ペア間にエッジを追加すべきか否かを条件付き相互情報量を求めることにより判断し、
該判断に際して、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値が第1の閾値未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加し、
前記条件付き相互情報量の計算において、該2つの確率変数の状態とそれぞれの確率変数に対応する状態集合についての同時確率分布が前記第1の閾値以下の第2の閾値未満となる場合には、関連する成分の計算を省略するように構成される情報処理装置。 An information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and a state that each random variable takes using a constraint-based learning algorithm,
Determine whether or not to add an edge between a pair of random variables by obtaining conditional mutual information,
In the determination, for the two random variable nodes constituting the random variable pair, a set of random variables included in a set of nodes on the path between them and adjacent to either of the two random variable nodes is used as a condition set. If a conditional mutual information amount is calculated and a pair whose value is less than the first threshold is found, no edge is added between the two random variables, otherwise an edge is added,
In the calculation of the conditional mutual information, when the joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is less than a second threshold equal to or less than the first threshold An information processing apparatus configured to omit calculation of related components. - 前記確率変数ペアを構成する前記2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合Cについての条件付き相互情報量を、条件集合Cのサイズが1である場合から前記最終的な条件集合のサイズとなる場合までサイズを大きくしながら計算し、その過程で前記第1の閾値未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加するように構成される請求項4に記載の情報処理装置。 With respect to the two random variable nodes constituting the random variable pair, a set of nodes on the path between them and adjacent to either of the two random variable nodes is defined as a final condition set, and the two random variable nodes And the conditional mutual information about the condition set C is calculated while increasing the size from the case where the size of the condition set C is 1 to the size of the final condition set. If a conditional mutual information amount that is less than the threshold value is obtained, an edge is not added between the two random variables. Otherwise, an edge is determined to be necessary and added. Item 5. The information processing apparatus according to Item 4.
- 複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行うプログラムであって、コンピュータに、
(A)前記入力データについて木構造のグラフを生成するステップであって、相互情報量が第1の閾値以上である確率変数ペアの各々について、該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加するステップを含む、木構造のグラフを生成するステップと、
(B)相互情報量が前記第1の閾値以上でありながら前記ステップ(A)によってエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にエッジを追加するステップであって、
該エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値が前記第1の閾値未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加するステップと、
前記条件付き相互情報量の計算において、該2つの確率変数の状態とそれぞれの確率変数に対応する状態集合とについての同時確率分布が前記第1の閾値以下の第2の閾値未満となる場合には、関連する成分の計算を省略するステップとを含む、エッジを追加するステップと
を実行させるプログラム。 A program that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes,
(A) A step of generating a tree-structured graph for the input data, the graph even if an edge is added between the random variable pairs for each of the random variable pairs whose mutual information amount is equal to or greater than the first threshold. Generating a tree-structured graph including adding an edge if the structure remains a tree structure;
(B) For each random variable pair in which the mutual information amount is equal to or larger than the first threshold and the edge is not added by the step (A), an edge is added when an edge is necessary,
For two random variable nodes constituting a random variable pair to which the edge has not been added, a set of random variables included in a set of nodes on a path between them and adjacent to one of the two random variable nodes If conditional mutual information is calculated as a set and a pair whose value is less than the first threshold is found, no edge is added between the two random variables, otherwise an edge is added And steps to
In the calculation of the conditional mutual information, when the joint probability distribution for the states of the two random variables and the state set corresponding to each random variable is less than a second threshold equal to or less than the first threshold And a step of adding an edge including a step of omitting calculation of related components. - 前記ステップ(A)は、
前記入力データに含まれる各確率変数の周辺確率を計算するステップと、
確率変数ペアを構成するいずれかの確率変数がある状態を取る周辺確率が前記第2の閾値未満である場合には、関連する相互情報量の成分の計算を省略することにより、各確率変数ペアの相互情報量を計算するステップとを含み、
前記ステップ(B)は、
前記エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて
、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合Cについての条件付き相互情報量を、条件集合Cのサイズが1である場合から前記最終的な条件集合のサイズとなる場合までサイズを大きくしながら計算し、その過程で前記第1の閾値未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加するステップを含み、前記プログラムはさらに、コンピュータに、
(C)前記ステップ(B)による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、不要である場合にエッジを削除するステップと、
(D)各エッジの方向付けを行うステップと
を実行させる請求項6に記載のプログラム。 The step (A)
Calculating marginal probabilities for each random variable included in the input data;
When the marginal probability that takes a state in which one of the random variables constituting the random variable pair is less than the second threshold value, the calculation of the component of the related mutual information amount is omitted, whereby each random variable pair Calculating a mutual information amount of
The step (B)
For the two random variable nodes constituting the random variable pair to which the edge has not been added, a set of nodes on the path between them and adjacent to one of the two random variable nodes is defined as a final condition set. The conditional mutual information about the two random variable nodes and the condition set C is calculated while increasing the size from the case where the size of the condition set C is 1 to the size of the final condition set, If a conditional mutual information amount that is less than the first threshold is obtained in the process, an edge is not added between the two random variables; otherwise, an edge is determined to be necessary and an edge is added. The program further includes:
(C) determining whether an edge is necessary for each random variable pair having an edge after the processing of step (B), and deleting the edge if unnecessary;
The program according to claim 6, wherein (D) a step of directing each edge is executed. - 前記ステップ(C)は、
無向エッジを有する確率変数ペアについて、該無向エッジ以外のパスが存在する場合、該無向エッジを一時的に削除するステップと、
該確率変数ペアについて、前記ステップ(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加するステップと、
無向エッジを有する確率変数ペアの少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する場合、該無向エッジを一時的に削除するステップと、
該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノード及び該隣接するノードにさらに隣接するノードを含む集合を最終的な条件集合として、前記ステップ(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加するステップとを含む請求項7に記載のプログラム。 The step (C)
For a random variable pair having an undirected edge, if there is a path other than the undirected edge, temporarily deleting the undirected edge; and
For the random variable pair, as in step (B), adding the temporarily deleted undirected edge when necessary;
If at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, temporarily deleting the undirected edge;
With respect to two random variable nodes constituting the random variable pair, a final set including a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node on a path between the two random variable nodes The program according to claim 7, further comprising a step of adding the temporarily deleted undirected edge as a condition set in the same manner as in the step (B). - 制約ベース学習アルゴリズムを使用して、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行うプログラムであって、コンピュータに、
ある確率変数ペア間にエッジを追加すべきか否かを条件付き相互情報量を求めることにより判断するステップを実行させ、
前記判断するステップは、該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数の組を条件集合として条件付き相互情報量を計算し、その値が前記第1の閾値未満となる組が見つかった場合には、該2つの確率変数間にエッジを追加せず、そうでなければエッジを追加するステップと、
前記条件付き相互情報量の計算において、該2つの確率変数の状態とそれぞれの確率変数に対応する状態集合についての同時確率分布が第1の閾値以下の第2の閾値未満となる場合には、関連する成分の計算を省略するステップとを含むプログラム。 A program that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes using a constraint-based learning algorithm,
Performing a step of determining whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount;
In the determining step, for two random variable nodes constituting the random variable pair, a set of random variables included in a set of nodes on a path between them and adjacent to one of the two random variable nodes is defined as a condition. If conditional mutual information is calculated as a set and a pair whose value is less than the first threshold is found, no edge is added between the two random variables, otherwise an edge is added And steps to
In the calculation of the conditional mutual information, when the joint probability distribution for the state sets corresponding to the states of the two random variables and the respective random variables is less than the second threshold equal to or less than the first threshold, Including a step of omitting calculation of related components. - 前記確率変数ペアを構成する前記2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合を最終的な条件集合として、該2つの確率変数ノード及び条件集合Cについての条件付き相互情報量を、条件集合Cのサイズが1である場合から前記最終的な条件集合のサイズとなる場合までサイズを大きくしながら計算し、その過程で前記第1の閾値未満となる条件付相互情報量が得られた場合には、該2つの確率変数間にエッジを追加せず、そうでなければ必要と判断してエッジを追加するステップをコンピュータに実行させる請求項9に記載のプログラム。 With respect to the two random variable nodes constituting the random variable pair, a set of nodes on the path between them and adjacent to either of the two random variable nodes is defined as a final condition set, and the two random variable nodes And the conditional mutual information about the condition set C is calculated while increasing the size from the case where the size of the condition set C is 1 to the size of the final condition set. If a conditional mutual information amount that is less than the threshold value is obtained, an edge is not added between the two random variables. Otherwise, the computer determines that it is necessary and adds an edge. The program according to claim 9.
- 複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置であって、
(A)前記入力データについて木構造のグラフを生成する手段であって、相互情報量が第1の閾値以上である確率変数ペアの各々について、該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加する、木構造のグラフを生成する手段と、
(B)相互情報量が前記第1の閾値以上でありながら前記手段(A)によってエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にエッジを追加する手段であって、
(B1)該エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数を含む条件集合を候補条件集合とし、前記候補条件集合内の各1つの確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに該確率変数ペアについての処理を終了し、
(B2)前記(B1)において処理が終了しない場合、前記候補条件集合内のいずれか2つの確率変数の組を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、計算された条件付き相互情報量が一方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該一方の確率変数を削除し、計算された条件付き相互情報量が他方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該他方の確率変数を削除し、
(B3)前記(B2)において処理が終了しない場合、前記候補条件集合に残るすべての確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、
(B4)前記(B1)乃至(B3)において処理が終了しない場合、該2つの確率変数間にエッジを追加する、エッジを追加する手段と
を具備する情報処理装置。 An information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and the states that each random variable takes,
(A) A means for generating a tree-structured graph for the input data, the graph even if an edge is added between the random variable pairs for each of the random variable pairs whose mutual information amount is equal to or greater than the first threshold value. Means for generating a graph of the tree structure, adding edges if the structure remains a tree structure;
(B) A means for adding an edge when an edge is necessary for each random variable pair in which the mutual information amount is not less than the first threshold and the edge is not added by the means (A),
(B1) With respect to two random variable nodes constituting the random variable pair to which the edge is not added, random variables included in a set of nodes on a path between them and adjacent to one of the two random variable nodes A condition set including a candidate condition set, and each one of the random variables in the candidate condition set as a condition set to calculate a conditional mutual information amount, and at least one of the calculated conditional mutual information amounts is the first If it is less than the threshold, the processing for the random variable pair is terminated without adding an edge between the two random variables,
(B2) If the processing does not end in (B1), the conditional mutual information is calculated using any two sets of random variables in the candidate condition set as the condition set, and the calculated conditional mutual information If at least one is less than the first threshold value, the process is terminated without adding an edge between the two random variables, and the calculated conditional mutual information is limited to only one random variable. If it is larger than the conditional mutual information already calculated as a set, the one random variable is deleted from the candidate condition set, and the calculated conditional mutual information includes only the other random variable as the conditional set. If it is larger than the already calculated conditional mutual information, the other random variable is deleted from the candidate condition set,
(B3) If the processing does not end in (B2), the conditional mutual information is calculated using all the random variables remaining in the candidate condition set as the condition set, and at least one of the calculated conditional mutual information is If it is less than the first threshold, the process ends without adding an edge between the two random variables,
(B4) An information processing apparatus comprising means for adding an edge that adds an edge between the two random variables when the processing is not completed in (B1) to (B3). - (C)前記手段(B)による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、不要である場合にエッジを削除する手段と、
(D)各エッジの方向付けを行う手段と
を具備する請求項11に記載の情報処理装置。 (C) means for determining whether an edge is necessary for each random variable pair having an edge after processing by the means (B), and deleting the edge if unnecessary;
The information processing apparatus according to claim 11, further comprising: (D) means for directing each edge. - 前記手段(C)は、
無向エッジを有する確率変数ペアについて、該無向エッジ以外のパスが存在する場合、該無向エッジを一時的に削除し、
該確率変数ペアについて、前記手段(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加し、
無向エッジを有する確率変数ペアの少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する場合、該無向エッジを一時的に削除し、
該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノード及び該隣接するノードにさらに隣接するノードを含む集合を候補条件集合として、前記手段(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加する請求項12に記載の情報処理装置。 The means (C)
For a random variable pair having an undirected edge, if there is a path other than the undirected edge, temporarily delete the undirected edge,
For the random variable pair, in the same manner as the means (B), add the undirected edge temporarily deleted when necessary,
If at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, the undirected edge is temporarily deleted;
A candidate condition set of two random variable nodes constituting the random variable pair, including a node on a path between the two random variable nodes and a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node The information processing apparatus according to claim 12, wherein the undirected edge that has been temporarily deleted is added when necessary, similarly to the means (B). - 制約ベース学習アルゴリズムを使用して、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行う情報処理装置であって、
ある確率変数ペア間にエッジを追加すべきか否かを条件付き相互情報量を求めることにより判断し、該判断に際して、
(A1)該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数を含む条件集合を候補条件集合とし、前記候補条件集合内の各1つの確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに該確率変数ペアについての処理を終了し、
(A2)前記(A1)において処理が終了しない場合、前記候補条件集合内のいずれか2つの確率変数の組を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、計算された条件付き相互情報量が一方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該一方の確率変数を削除し、計算された条件付き相互情報量が他方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該他方の確率変数を削除し、
(A3)前記(A2)において処理が終了しない場合、前記候補条件集合に残るすべての確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、
(A4)前記(A1)乃至(A3)において処理が終了しない場合、該2つの確率変数間にエッジを追加する
ように構成される情報処理装置。 An information processing apparatus that performs Bayesian network structure learning from input data including information about a plurality of random variables and a state that each random variable takes using a constraint-based learning algorithm,
It is determined whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount.
(A1) With respect to two random variable nodes constituting the random variable pair, a condition set including a random variable included in a set of nodes on a path between them and adjacent to either of the two random variable nodes is selected as a candidate condition. When the conditional mutual information is calculated using the one random variable in the candidate condition set as the conditional set, and at least one of the calculated conditional mutual information is less than the first threshold , Terminate the processing for the random variable pair without adding an edge between the two random variables,
(A2) If the processing does not end in (A1), the conditional mutual information is calculated using any two random variable pairs in the candidate condition set as a conditional set, and the calculated conditional mutual information If at least one is less than the first threshold value, the process is terminated without adding an edge between the two random variables, and the calculated conditional mutual information is limited to only one random variable. If it is larger than the conditional mutual information already calculated as a set, the one random variable is deleted from the candidate condition set, and the calculated conditional mutual information includes only the other random variable as the conditional set. If it is larger than the already calculated conditional mutual information, the other random variable is deleted from the candidate condition set,
(A3) If the processing does not end in (A2), the conditional mutual information is calculated using all the random variables remaining in the candidate condition set as a conditional set, and at least one of the calculated conditional mutual information is If it is less than the first threshold, the process ends without adding an edge between the two random variables,
(A4) An information processing apparatus configured to add an edge between the two random variables when the processing does not end in (A1) to (A3). - 複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行うプログラムであって、コンピュータに、
(A)前記入力データについて木構造のグラフを生成するステップであって、相互情報量が第1の閾値以上である確率変数ペアの各々について、該確率変数ペア間にエッジを追加してもグラフ構造が木構造のままである場合にエッジを追加する、木構造のグラフを生成するステップと、
(B)相互情報量が前記第1の閾値以上でありながら前記手段(A)によってエッジが追加されなかった各確率変数ペアについて、エッジが必要である場合にエッジを追加するステップであって、
(B1)該エッジが追加されなかった確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数を含む条件集合を候補条件集合とし、前記候補条件集合内の各1つの確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに該確率変数ペアについての処理を終了し、
(B2)前記(B1)において処理が終了しない場合、前記候補条件集合内のいずれか2つの確率変数の組を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、計算された条件付き相互情報量が一方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該一方の確率変数を削除し、計算された条件付き相互情報量が他方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該他方の確率変数を削除し、
(B3)前記(B2)において処理が終了しない場合、前記候補条件集合に残るすべての確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、
(B4)前記(B1)乃至(B3)において処理が終了しない場合、該2つの確率変数間にエッジを追加する、エッジを追加するステップと
を実行させるプログラム。 A program that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes,
(A) A step of generating a tree-structured graph for the input data, the graph even if an edge is added between the random variable pairs for each of the random variable pairs whose mutual information amount is equal to or greater than the first threshold. Generating a tree graph, adding an edge if the structure remains a tree structure;
(B) adding an edge when an edge is necessary for each random variable pair in which the mutual information amount is equal to or greater than the first threshold and the edge is not added by the means (A),
(B1) With respect to two random variable nodes constituting the random variable pair to which the edge is not added, random variables included in a set of nodes on a path between them and adjacent to one of the two random variable nodes A condition set including a candidate condition set, and each one of the random variables in the candidate condition set as a condition set to calculate a conditional mutual information amount, and at least one of the calculated conditional mutual information amounts is the first If it is less than the threshold, the processing for the random variable pair is terminated without adding an edge between the two random variables,
(B2) If the processing does not end in (B1), the conditional mutual information is calculated using any two sets of random variables in the candidate condition set as the condition set, and the calculated conditional mutual information If at least one is less than the first threshold value, the process is terminated without adding an edge between the two random variables, and the calculated conditional mutual information is limited to only one random variable. If it is larger than the conditional mutual information already calculated as a set, the one random variable is deleted from the candidate condition set, and the calculated conditional mutual information includes only the other random variable as the conditional set. If it is larger than the already calculated conditional mutual information, the other random variable is deleted from the candidate condition set,
(B3) If the processing does not end in (B2), the conditional mutual information is calculated using all the random variables remaining in the candidate condition set as the condition set, and at least one of the calculated conditional mutual information is If it is less than the first threshold, the process ends without adding an edge between the two random variables,
(B4) A program that, when the processing is not completed in (B1) to (B3), adds an edge between the two random variables, and adds an edge. - (C)前記ステップ(B)による処理後にエッジを有する各確率変数ペアにつき、エッジが必要であるか否かを判断し、不要である場合にエッジを削除するステップと、
(D)各エッジの方向付けを行うステップと
を具備する請求項15に記載のプログラム。 (C) determining whether an edge is necessary for each random variable pair having an edge after the processing of step (B), and deleting the edge if unnecessary;
The program according to claim 15, further comprising: (D) directing each edge. - 前記ステップ(C)は、
無向エッジを有する確率変数ペアについて、該無向エッジ以外のパスが存在する場合、該無向エッジを一時的に削除するステップと、
該確率変数ペアについて、前記手段(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加するステップと、
無向エッジを有する確率変数ペアの少なくとも一方の確率変数ノードが他方のノード以外に3つ以上の隣接ノードを有する場合、該無向エッジを一時的に削除するステップと、
該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノード及び該隣接するノードにさらに隣接するノードを含む集合を候補条件集合として、前記手段(B)と同様にして、必要である場合に前記一時的に削除した無向エッジを追加するするステップと
を含む請求項16に記載の情報処理装置。 The step (C)
For a random variable pair having an undirected edge, if there is a path other than the undirected edge, temporarily deleting the undirected edge; and
For the random variable pair, as in the means (B), adding the temporarily deleted undirected edge when necessary;
If at least one random variable node of a random variable pair having an undirected edge has three or more adjacent nodes other than the other node, temporarily deleting the undirected edge;
A candidate condition set for a set of two random variable nodes constituting the random variable pair, including a node on a path between the two random variable nodes and a node adjacent to one of the two random variable nodes and a node further adjacent to the adjacent node The information processing apparatus according to claim 16, further comprising: adding the temporarily deleted undirected edge when necessary, in the same manner as the means (B). - 制約ベース学習アルゴリズムを使用して、複数の確率変数及び各確率変数が取る状態についての情報を含む入力データからベイジアンネットワーク構造学習を行うプログラムであって、コンピュータに、
ある確率変数ペア間にエッジを追加すべきか否かを条件付き相互情報量を求めることにより判断するステップを実行させ、
前記判断するステップは、
(A1)該確率変数ペアを構成する2つの確率変数ノードについて、その間のパス上にあり該2つの確率変数ノードのいずれかに隣接するノードの集合に含まれる確率変数を含む条件集合を候補条件集合とし、前記候補条件集合内の各1つの確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに該確率変数ペアについての処理を終了するステップと、
(A2)前記(A1)において処理が終了しない場合、前記候補条件集合内のいずれか2つの確率変数の組を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了し、計算された条件付き相互情報量が一方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該一方の確率変数を削除し、計算された条件付き相互情報量が他方の確率変数のみを条件集合として既に計算された条件付き相互情報量よりも大きい場合には、前記候補条件集合から該他方の確率変数を削除するステップと、
(A3)前記(A2)において処理が終了しない場合、前記候補条件集合に残るすべての確率変数を条件集合として条件付き相互情報量を計算し、計算された条件付き相互情報量の少なくともいずれかが前記第1の閾値未満となる場合には、該2つの確率変数間にエッジを追加せずに処理を終了するステップと、
(A4)前記(A1)乃至(A3)において処理が終了しない場合、該2つの確率変数間にエッジを追加するするステップと
を含むプログラム。 A program that performs Bayesian network structure learning from input data that includes information about a plurality of random variables and the states that each random variable takes using a constraint-based learning algorithm,
Performing a step of determining whether or not an edge should be added between a certain random variable pair by obtaining a conditional mutual information amount;
The step of determining includes
(A1) With respect to two random variable nodes constituting the random variable pair, a condition set including a random variable included in a set of nodes on a path between them and adjacent to either of the two random variable nodes is selected as a candidate condition. When the conditional mutual information is calculated using the one random variable in the candidate condition set as the conditional set, and at least one of the calculated conditional mutual information is less than the first threshold Ending the processing for the random variable pair without adding an edge between the two random variables;
(A2) If the processing does not end in (A1), the conditional mutual information is calculated using any two random variable pairs in the candidate condition set as a conditional set, and the calculated conditional mutual information If at least one is less than the first threshold value, the process is terminated without adding an edge between the two random variables, and the calculated conditional mutual information is limited to only one random variable. If it is larger than the conditional mutual information already calculated as a set, the one random variable is deleted from the candidate condition set, and the calculated conditional mutual information includes only the other random variable as the conditional set. If it is greater than the already calculated conditional mutual information, deleting the other random variable from the candidate condition set;
(A3) If the processing does not end in (A2), the conditional mutual information is calculated using all the random variables remaining in the candidate condition set as a conditional set, and at least one of the calculated conditional mutual information is If less than the first threshold, ending the process without adding an edge between the two random variables;
(A4) A program including a step of adding an edge between the two random variables when the processing does not end in (A1) to (A3).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011525822A JP5555238B2 (en) | 2009-08-06 | 2010-05-27 | Information processing apparatus and program for Bayesian network structure learning |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009182993 | 2009-08-06 | ||
JP2009-182993 | 2009-08-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011016281A2 true WO2011016281A2 (en) | 2011-02-10 |
Family
ID=43544742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/058963 WO2011016281A2 (en) | 2009-08-06 | 2010-05-27 | Information processing device and program for learning bayesian network structure |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP5555238B2 (en) |
WO (1) | WO2011016281A2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015045091A1 (en) * | 2013-09-27 | 2015-04-02 | 株式会社シーエーシー | Method and program for extraction of super-structure in structural learning of bayesian network |
WO2018076916A1 (en) * | 2016-10-27 | 2018-05-03 | 中兴通讯股份有限公司 | Data publishing method and device, and terminal |
JP2020529777A (en) * | 2017-08-01 | 2020-10-08 | エルゼビア インコーポレイテッド | Systems and methods for extracting structures from large, high density, high noise networks |
WO2024180746A1 (en) * | 2023-03-01 | 2024-09-06 | 富士通株式会社 | Causality search program, causality search method, and information processing device |
-
2010
- 2010-05-27 JP JP2011525822A patent/JP5555238B2/en not_active Expired - Fee Related
- 2010-05-27 WO PCT/JP2010/058963 patent/WO2011016281A2/en active Application Filing
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015045091A1 (en) * | 2013-09-27 | 2015-04-02 | 株式会社シーエーシー | Method and program for extraction of super-structure in structural learning of bayesian network |
WO2018076916A1 (en) * | 2016-10-27 | 2018-05-03 | 中兴通讯股份有限公司 | Data publishing method and device, and terminal |
CN108009437A (en) * | 2016-10-27 | 2018-05-08 | 中兴通讯股份有限公司 | Data publication method and apparatus and terminal |
CN108009437B (en) * | 2016-10-27 | 2022-11-22 | 中兴通讯股份有限公司 | Data release method and device and terminal |
JP2020529777A (en) * | 2017-08-01 | 2020-10-08 | エルゼビア インコーポレイテッド | Systems and methods for extracting structures from large, high density, high noise networks |
WO2024180746A1 (en) * | 2023-03-01 | 2024-09-06 | 富士通株式会社 | Causality search program, causality search method, and information processing device |
Also Published As
Publication number | Publication date |
---|---|
JP5555238B2 (en) | 2014-07-23 |
JPWO2011016281A1 (en) | 2013-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7392668B2 (en) | Data processing methods and electronic equipment | |
US11354365B1 (en) | Using aggregate compatibility indices to identify query results for queries having qualitative search terms | |
JP7169369B2 (en) | Method, system for generating data for machine learning algorithms | |
US11423082B2 (en) | Methods and apparatus for subgraph matching in big data analysis | |
US7818303B2 (en) | Web graph compression through scalable pattern mining | |
Addis et al. | Hybrid constructive heuristics for the critical node problem | |
US8903824B2 (en) | Vertex-proximity query processing | |
CN109325182B (en) | Information pushing method and device based on session, computer equipment and storage medium | |
US11748351B2 (en) | Class specific query processing | |
Ben-Shimon et al. | An ensemble method for top-N recommendations from the SVD | |
JP5555238B2 (en) | Information processing apparatus and program for Bayesian network structure learning | |
Guyet et al. | Incremental mining of frequent serial episodes considering multiple occurrences | |
CN101635001B (en) | Method and apparatus for extracting information from a database | |
Bernard et al. | Discovering customer journeys from evidence: a genetic approach inspired by process mining | |
CN109492844B (en) | Method and device for generating business strategy | |
CN106599122B (en) | Parallel frequent closed sequence mining method based on vertical decomposition | |
JPWO2013111287A1 (en) | SPARQL query optimization method | |
JP5945206B2 (en) | Product recommendation device, method and program | |
JP5964781B2 (en) | SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM | |
JP5622880B2 (en) | Item recommendation system, item recommendation method, and item recommendation program | |
CN113918807A (en) | Data recommendation method and device, computing equipment and computer-readable storage medium | |
Li et al. | Optimizing ml inference queries under constraints | |
CN113469819A (en) | Recommendation method of fund product, related device and computer storage medium | |
Burashnikova et al. | Sequential learning over implicit feedback for robust large-scale recommender systems | |
JP6802109B2 (en) | Software specification analyzer and software specification analysis method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10806281 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011525822 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10806281 Country of ref document: EP Kind code of ref document: A2 |