CN106599049A

CN106599049A - Decision table data reduction method

Info

Publication number: CN106599049A
Application number: CN201610996984.2A
Authority: CN
Inventors: 尹林子; 许雪梅; 丁家峰; 蒋昭辉; 李乐; 李靖
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2016-11-09
Filing date: 2016-11-09
Publication date: 2017-04-26
Anticipated expiration: 2036-11-09
Also published as: CN106599049B

Abstract

The invention provides a decision table data reduction method. The method comprises the following steps: step one, judging whether the last condition attribute in a decision table data set is a decision table core attribute, adding the data of the attribute in a reduction set R if the last condition attribute is the decision table core attribute, and executing the step 2; deleting the last line of condition attribute data if the last condition attribute is not the decision table core attribute, and re-executing the step one; and step two, placing the line data corresponding to the decision table core attribute in the first line, and outputting the reduction set R after determining that an end condition is met, or else returning to step one. The method provided by the invention has the advantage of simply and effectively performing reduction on the decision table data.

Description

A kind of decision table Data Reduction method

Technical field

The present invention relates to technical field of data processing, more particularly, to a kind of decision table Data Reduction method.

Background technology

At present, as data acquisition, the fast development of memory technology, the problem of data redundancy are increasingly severe, it is not only Memory space is greatly wasted, the performance based on the modeling of data, decision-making scheduling algorithm can be also significantly reduced.Rough set theory is one Kind special yojan data, from the theory of extracting data effective information.The theoretical core is Data Reduction, by weighing Data want, redundancy and attribute are deleted, and are based on number so as to obtain a new data set simplified comprising complete information According to analysis, modeling, decision-making etc. the source data of high-quality is provided.

Traditional Data Reduction method is frequently with the heuristic reduction structure based on Attribute Significance.Its scheme is stated such as Under：Step 1, data set pretreatment, and calculate Core Attributes of Decision Table collection；Step 2, calculates the importance degree of each attribute；Step 3, Select the attribute with maximum importance degree；Step 4, based on all attribute modification data sets selected；Step 5, judges whether Meet algorithm termination condition, the property set selected is exported if meeting, otherwise jump to step 2.Traditional heuristic reduction side The characteristics of method is to need computation attribute importance degree and whole Core Attributes of Decision Table collection.Especially the definition of Attribute Significance with Calculating has attracted the attention of Many researchers, and achieves substantial amounts of achievement.

However, this traditional heuristic reduction structure comes with some shortcomings, it is mainly manifested in：First, importance degree is calculated Number of times is too many, step 2 can be performed a number of times, and most of attribute will repeatedly be calculated importance degree, if step 4 adopts addition Pattern, then importance degree needs to calculate (2 | C |-| R |+1) * | R |/2 time, if step 4 takes subtraction mode, importance degree to need Calculating (| C |+| R |+1 |) * (| C |-| R |)/2 times, therefore, no matter whether the computing formula of Attribute Significance is simple, it is required for wave Take the substantial amounts of time；Second, based on the randomness open-ended question of Attribute Significance, existing Attribute Significance computational methods have Multiple attributes with maximum importance degree may be produced, existing solution is usually randomly choosed in step 3, this will Result and nicety of grading to attribute reduction produces an impact for being difficult to predict.

The content of the invention

The present invention is to overcome the problems referred to above or solve the above problems at least in part, there is provided a kind of simpler heuristic Yojan structure, for theories integration and implementation method that the design of high speed Algorithm for Reduction provides structural level.

According to an aspect of the present invention, there is provided a kind of decision table Data Reduction method, including：Step 1, judges decision table Whether last conditional attribute is Core Attributes of Decision Table in data set, if it is the data of the attribute is added into yojan collection R, Execution step 2；Last string conditional attribute data are otherwise deleted, step 1 is re-executed；Step 2, by the Core Attributes of Decision Table Corresponding column data is put into first row, it is determined that meeting after termination condition, exports yojan collection R, otherwise return to step 1.

The application proposes a kind of decision table Data Reduction method, and traditional attribute is replaced using Core Attributes of Decision Table judgement Importance degree is calculated；Efficient Core Attributes of Decision Table evaluation algorithm and positive domain computational algorithm are built using ordering techniques；Each category Property at most calculate once, or retaining, or abandoning；The column data of redundancy can be constantly deleted in initial process, after reduction The time of continuous initial process and space complexity.Have the advantages that：1st, instant invention overcomes traditional based on attribute The defect of importance degree yojan structure.Show：Traditional Attribute Significance concept is abandoned, it is not necessary to which design importance degree calculates public Formula, the stochastic problems that also there is no attribute inspiration, result of calculation is objective, favorable repeatability；2nd, method of the present invention structure letter It is single.Show：Each attribute is at most calculated once, and the attribute of traditional method needs repeatedly to calculate；Secondly, the present invention need not Whole Core Attributes of Decision Table collection was calculated before inspiration；3rd, the method for the present invention is related only to sort and compares operation, calculates letter It is single, not only it is easily achieved on unit, it also is adapted for being run on big data platform；4th, method of the present invention calculating speed is fast.It is logical The fast algorithm recommended using the present invention is crossed, a complete yojan can be quickly calculated.

Description of the drawings

Fig. 1 is according to a kind of schematic flow sheet of decision table Data Reduction method of the embodiment of the present invention.

Specific embodiment

With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Hereinafter implement Example is not limited to the scope of the present invention for illustrating the present invention.

Present invention is generally directed to decision table data are processed, S=＜ U, At, { V are expressed as_a|a∈At},{I_a|a∈At} ＞, wherein, U is the set of all samples, At=C ∪ D, C={ c₁,c₂,…,c_nIt is referred to as conditional attribute collection, D={ d } is referred to as certainly Plan property set.

Fig. 1, in one particular embodiment of the present invention, illustrates a kind of overall flow of decision table Data Reduction method. On the whole, including：Step 1, judges whether last conditional attribute is Core Attributes of Decision Table in decision table data set, if It is then the data of the attribute to be added into yojan collection R, execution step 2；Last string conditional attribute data are otherwise deleted, is re-executed Step 1；Step 2, by the corresponding column data of the Core Attributes of Decision Table first row is put into, it is determined that meeting after termination condition, is exported Yojan collection R.

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1：Delete Repeated sample in decision table data set；Such as decision table data set is inconsistent, then by the decision-making of all samples in decision table data set Value is changed into d₀, the d₀It is bigger by 1 than existing maximum decision value：d₀=max (d (x))+1.

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1：To data Collection carries out ascending sort；Then to any two adjacent sample x_i,x_i+1

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1：Arrange Yojan collection

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, judges decision table number in step 1 According to concentration, whether last conditional attribute is Core Attributes of Decision Table, including：Decision table data set is ranked up according to ascending order, Such as two adjacent sample x_i,x_i+1Meet condition：c_n(x_i) ＜ c_n(x_i+1),d(x_i)≠d(x_i+1) and other property values are all equal, Then last conditional attribute is Core Attributes of Decision Table, and wherein c is conditional attribute collection, and d is decision kind set.

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, described will determine in the step 2 The corresponding column data of plan table core attributes is put into first row, also includes afterwards：Delete the positive domain of yojan collection R.

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, termination condition in the step 2 For：Decision table data set is all d of decision value of empty or data left₀。

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, the deletion yojan collection R is just Domain includes：Letter collection R is sorted in ascending order, to any two adjacent sample x_i,x_i+1Detected as follows, if R is (x_i)≠R(x_i+1), Then judge x_iWhether the decision value number of place particle is 1, if then deleting the particle [x_i]_R。

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, original decision table data set is such as Shown in table 1.

The original decision table data set of 1 one, table

S1, is arrangedBy data set according to ascending sort, if matlab environment, then using sortrows functions It is ranked up, then according to the algorithm recommended checks adjacent sample, obtains new decision table data set as shown in table 2.

Data set after the pretreatment of table 2

Wherein, the decision value (the namely value of last string) of all of inconsistent sample is all changed to " 4 ".

S2, Rule of judgment attribute c₄Whether it is Core Attributes of Decision Table.Concrete grammar is as follows：First compare first and second Sample, their c₂,c₃Value it is different；Then second and the 3rd sample, their c are compared₁,c₃Value it is different；Then is compared Three and the 4th sample, their c₂,c₃Value it is different；By that analogy, finally find, without two adjacent samples scheme is met In condition, therefore, attribute c₄It is not Core Attributes of Decision Table.

S3, deletes last column data, turns upper step2. to attribute c₃Detected.It can be found that：Comparing the 4th When sample and the 5th sample, their c is found₁,c₂Property value identical (being 2 and 3), c₃Property value difference (respectively 1 He 2), corresponding decision value is also different (respectively 3 and 0), therefore attribute c₃It is Core Attributes of Decision Table.R={ c₃, turn following steps.

S4, first by attribute c₃Corresponding row are put into first row, then according to ascending sort, obtains the data set such as institute of table 3 Show.

Data set after the sequence of table 3

According to the algorithm of the present invention, adjacent sample is detected, due to R={ c₃, therefore, detailed process is as follows：Order Gr={ x1 }, first and second sample compare, it is known that their c₃Value is identical, is consequently belonging to same particle, gr= {x1,x2}；Next second sample and the 3rd sample are compared, their c₃Value is differed, therefore, gr is sentenced Disconnected, discovery has two decision values (0 and 3), so { x1, x2 } is coarse particles, makes gr={ x3 }；Next, comparing the 3rd With the 4th sample, their c₃It is worth identical, gr={ x3, x4 }, next, comparing the 4th with the 5th sample, their c₃ It is worth also identical, gr={ x3, x4, x5 }, next, comparing the 5th with the 6th sample, c₃Value is differed, therefore, gr is entered Row judges that discovery has multiple decision values, so { x3, x4, x5 } is coarse particles, makes gr={ x6 }；Next, comparing the 6th With the 7th sample, c₃Value is differed, therefore, gr is judged, find only one of which decision value, sample x6 belongs to positive domain, Make gr={ x7 }；Next, due to without adjacent sample, therefore directly judge the decision value of gr, discovery only one of which decision value, X7 belongs to positive domain；I.e. positive domain is { x6, x7 }. delete the data set obtained after positive domain as shown in table 4.

Table 4 is deleted after the positive domains of R

S5, is unsatisfactory for termination condition, redirects S2.

S2 is performed, c is judged₂, by comparing adjacent sample it is found that the c of the 3rd and the 4th sample₃,c₁Value phase Together, c₂Value is different, and decision value is also different, therefore, attribute c₂For Core Attributes of Decision Table, R={ c₃, c2 }. turn step4.

S4 is performed, by c₂Place column data moves to first row, according to ascending sort, as shown in table 5.

Table 5

Next calculate with regard to { c₃, c₂Positive domain, obtain positive domain for { x1, x2, x3, x4, x5 }, after deleting positive domain, number It is sky according to collection；

S5 is performed, because data set is sky, algorithm terminates, and exports final result R={ c₃, c₂}。

In one particular embodiment of the present invention, a kind of decision table Data Reduction method, related experiment shows, this Bright computation complexity is approximately O (| U | | C | | R |). based on nine standards UCI data set (Dermatology, Backup_ Large.test, Breast-cancer-wisconsin, Tic-tac-toe, Kr_vs_kp, Mushroom, Ticdata2000, Letter-recognition, Shuttle_all) test show, this method calculate the time for traditional method 0.09% To between 12.8%.It is compared as follows shown in each table with the calculating time of other Algorithm for Reduction of part.

Table 6 is contrasted with the calculating speed of algorithm FSPA-PR

Table 1 shows that the calculating time of the inventive method is between the 0.12%to 17.6% of FSPA-PR.Contrast algorithm FSPA-PR comes from：Y.H.Qian,J.Y.Liang,W.Pedrycz,C.Y.Dang,Positive approximation:An accelerator for attribute reduction in rough set theory,Artificial Intelligence 174(2010)597–618.

Table 7 is contrasted with the calculating speed of ADM and OADM

Table 2 shows that calculating between the 0.03%to 1.09% that the time is ADM for the inventive method, is the 3.18% of OADM Between to 27.79%.Contrast the algorithm ADM and OADM of table 2 is derived from：Z.Meng,Z.Shi,A fast approach to attribute reduction in incomplete decision systems with tolerance relation- based rough sets,Information Sciences 179(16)(2009)2774-2793.

Table 8 is contrasted with the calculating speed of Q-ARA

Table 3 shows that the calculating time of the inventive method is less than the 14.56% of Q-ARA.Contrast algorithm Q-ARA is derived from： M.Li,C.X.Shang,S.Z.Feng,et al.Quick attribute reduction in inconsistent decision tables,Information Sciences 254(2014)155-180.

The run time required for algorithm is contrasted in above-mentioned experimental result both from list of references itself, the fortune of this algorithm The row time is then using computer (Inter (R) the CPU G645,2.9GHz and 1.81GB of performance similar to contrast algorithm Memory) calculating acquisition is carried out.Software platform is matlab.Comparing result shows that the inventive method calculating speed is very fast.

Finally, the present processes are only preferably embodiment, are not intended to limit protection scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc. should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of decision table Data Reduction method, it is characterised in that include：

Step 1, judges that whether last conditional attribute is Core Attributes of Decision Table in decision table data set, if it is belongs to this Property data add yojan collection R, execution step 2；Last string conditional attribute data are otherwise deleted, step 1 is re-executed；

Step 2, by the corresponding column data of the Core Attributes of Decision Table first row is put into, it is determined that meeting after termination condition, output is about Letter collection R, otherwise return to step 1.

2. the method for claim 1, it is characterised in that also include before the step 1：Delete weight in decision table data set Duplicate sample sheet；As decision table data set is inconsistent, then the decision value of all samples in decision table data set is changed into into d₀, the d₀Than Existing maximum decision value is big by 1.

3. the method for claim 1, it is characterised in that also include before the step 1：Yojan collection is set

4. the method for claim 1, it is characterised in that last in decision table data set is judged in the step 1 Whether conditional attribute is Core Attributes of Decision Table, including：Decision table data set is ranked up according to ascending order, such as two adjacent samples x_i,x_i+1Meet condition：c_n(x_i) ＜ c_n(x_i+1),d(x_i)≠d(x_i+1) and other property values are all equal, then last condition Attribute is Core Attributes of Decision Table, and wherein c is conditional attribute collection, and d is decision kind set.

5. the method for claim 1, it is characterised in that by the corresponding row of the Core Attributes of Decision Table in the step 2 Data are put into first row, also include afterwards：Delete the positive domain of yojan collection R.

6. method as claimed in claim 1 or 2, it is characterised in that termination condition is in the step 2：Decision table data set For all d of decision value of empty or data left₀。

7. method as claimed in claim 5, it is characterised in that the positive domain of the deletion yojan collection R includes：To letter collection R by liter Sequence sorts, to any two adjacent sample x_i,x_i+1Detected as follows, if R is (x_i)≠R(x_i+1), then judge x_iPlace particle Whether decision value number is 1, if then deleting the particle [x_i]_R。