CN106599049A - Decision table data reduction method - Google Patents
Decision table data reduction method Download PDFInfo
- Publication number
- CN106599049A CN106599049A CN201610996984.2A CN201610996984A CN106599049A CN 106599049 A CN106599049 A CN 106599049A CN 201610996984 A CN201610996984 A CN 201610996984A CN 106599049 A CN106599049 A CN 106599049A
- Authority
- CN
- China
- Prior art keywords
- decision table
- decision
- attribute
- table data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a decision table data reduction method. The method comprises the following steps: step one, judging whether the last condition attribute in a decision table data set is a decision table core attribute, adding the data of the attribute in a reduction set R if the last condition attribute is the decision table core attribute, and executing the step 2; deleting the last line of condition attribute data if the last condition attribute is not the decision table core attribute, and re-executing the step one; and step two, placing the line data corresponding to the decision table core attribute in the first line, and outputting the reduction set R after determining that an end condition is met, or else returning to step one. The method provided by the invention has the advantage of simply and effectively performing reduction on the decision table data.
Description
Technical field
The present invention relates to technical field of data processing, more particularly, to a kind of decision table Data Reduction method.
Background technology
At present, as data acquisition, the fast development of memory technology, the problem of data redundancy are increasingly severe, it is not only
Memory space is greatly wasted, the performance based on the modeling of data, decision-making scheduling algorithm can be also significantly reduced.Rough set theory is one
Kind special yojan data, from the theory of extracting data effective information.The theoretical core is Data Reduction, by weighing
Data want, redundancy and attribute are deleted, and are based on number so as to obtain a new data set simplified comprising complete information
According to analysis, modeling, decision-making etc. the source data of high-quality is provided.
Traditional Data Reduction method is frequently with the heuristic reduction structure based on Attribute Significance.Its scheme is stated such as
Under:Step 1, data set pretreatment, and calculate Core Attributes of Decision Table collection;Step 2, calculates the importance degree of each attribute;Step 3,
Select the attribute with maximum importance degree;Step 4, based on all attribute modification data sets selected;Step 5, judges whether
Meet algorithm termination condition, the property set selected is exported if meeting, otherwise jump to step 2.Traditional heuristic reduction side
The characteristics of method is to need computation attribute importance degree and whole Core Attributes of Decision Table collection.Especially the definition of Attribute Significance with
Calculating has attracted the attention of Many researchers, and achieves substantial amounts of achievement.
However, this traditional heuristic reduction structure comes with some shortcomings, it is mainly manifested in:First, importance degree is calculated
Number of times is too many, step 2 can be performed a number of times, and most of attribute will repeatedly be calculated importance degree, if step 4 adopts addition
Pattern, then importance degree needs to calculate (2 | C |-| R |+1) * | R |/2 time, if step 4 takes subtraction mode, importance degree to need
Calculating (| C |+| R |+1 |) * (| C |-| R |)/2 times, therefore, no matter whether the computing formula of Attribute Significance is simple, it is required for wave
Take the substantial amounts of time;Second, based on the randomness open-ended question of Attribute Significance, existing Attribute Significance computational methods have
Multiple attributes with maximum importance degree may be produced, existing solution is usually randomly choosed in step 3, this will
Result and nicety of grading to attribute reduction produces an impact for being difficult to predict.
The content of the invention
The present invention is to overcome the problems referred to above or solve the above problems at least in part, there is provided a kind of simpler heuristic
Yojan structure, for theories integration and implementation method that the design of high speed Algorithm for Reduction provides structural level.
According to an aspect of the present invention, there is provided a kind of decision table Data Reduction method, including:Step 1, judges decision table
Whether last conditional attribute is Core Attributes of Decision Table in data set, if it is the data of the attribute is added into yojan collection R,
Execution step 2;Last string conditional attribute data are otherwise deleted, step 1 is re-executed;Step 2, by the Core Attributes of Decision Table
Corresponding column data is put into first row, it is determined that meeting after termination condition, exports yojan collection R, otherwise return to step 1.
The application proposes a kind of decision table Data Reduction method, and traditional attribute is replaced using Core Attributes of Decision Table judgement
Importance degree is calculated;Efficient Core Attributes of Decision Table evaluation algorithm and positive domain computational algorithm are built using ordering techniques;Each category
Property at most calculate once, or retaining, or abandoning;The column data of redundancy can be constantly deleted in initial process, after reduction
The time of continuous initial process and space complexity.Have the advantages that:1st, instant invention overcomes traditional based on attribute
The defect of importance degree yojan structure.Show:Traditional Attribute Significance concept is abandoned, it is not necessary to which design importance degree calculates public
Formula, the stochastic problems that also there is no attribute inspiration, result of calculation is objective, favorable repeatability;2nd, method of the present invention structure letter
It is single.Show:Each attribute is at most calculated once, and the attribute of traditional method needs repeatedly to calculate;Secondly, the present invention need not
Whole Core Attributes of Decision Table collection was calculated before inspiration;3rd, the method for the present invention is related only to sort and compares operation, calculates letter
It is single, not only it is easily achieved on unit, it also is adapted for being run on big data platform;4th, method of the present invention calculating speed is fast.It is logical
The fast algorithm recommended using the present invention is crossed, a complete yojan can be quickly calculated.
Description of the drawings
Fig. 1 is according to a kind of schematic flow sheet of decision table Data Reduction method of the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Hereinafter implement
Example is not limited to the scope of the present invention for illustrating the present invention.
Present invention is generally directed to decision table data are processed, S=< U, At, { V are expressed asa|a∈At},{Ia|a∈At}
>, wherein, U is the set of all samples, At=C ∪ D, C={ c1,c2,…,cnIt is referred to as conditional attribute collection, D={ d } is referred to as certainly
Plan property set.
Fig. 1, in one particular embodiment of the present invention, illustrates a kind of overall flow of decision table Data Reduction method.
On the whole, including:Step 1, judges whether last conditional attribute is Core Attributes of Decision Table in decision table data set, if
It is then the data of the attribute to be added into yojan collection R, execution step 2;Last string conditional attribute data are otherwise deleted, is re-executed
Step 1;Step 2, by the corresponding column data of the Core Attributes of Decision Table first row is put into, it is determined that meeting after termination condition, is exported
Yojan collection R.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1:Delete
Repeated sample in decision table data set;Such as decision table data set is inconsistent, then by the decision-making of all samples in decision table data set
Value is changed into d0, the d0It is bigger by 1 than existing maximum decision value:d0=max (d (x))+1.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1:To data
Collection carries out ascending sort;Then to any two adjacent sample xi,xi+1
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1:Arrange
Yojan collection
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, judges decision table number in step 1
According to concentration, whether last conditional attribute is Core Attributes of Decision Table, including:Decision table data set is ranked up according to ascending order,
Such as two adjacent sample xi,xi+1Meet condition:cn(xi) < cn(xi+1),d(xi)≠d(xi+1) and other property values are all equal,
Then last conditional attribute is Core Attributes of Decision Table, and wherein c is conditional attribute collection, and d is decision kind set.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, described will determine in the step 2
The corresponding column data of plan table core attributes is put into first row, also includes afterwards:Delete the positive domain of yojan collection R.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, termination condition in the step 2
For:Decision table data set is all d of decision value of empty or data left0。
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, the deletion yojan collection R is just
Domain includes:Letter collection R is sorted in ascending order, to any two adjacent sample xi,xi+1Detected as follows, if R is (xi)≠R(xi+1),
Then judge xiWhether the decision value number of place particle is 1, if then deleting the particle [xi]R。
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, original decision table data set is such as
Shown in table 1.
The original decision table data set of 1 one, table
S1, is arrangedBy data set according to ascending sort, if matlab environment, then using sortrows functions
It is ranked up, then according to the algorithm recommended checks adjacent sample, obtains new decision table data set as shown in table 2.
Data set after the pretreatment of table 2
Wherein, the decision value (the namely value of last string) of all of inconsistent sample is all changed to " 4 ".
S2, Rule of judgment attribute c4Whether it is Core Attributes of Decision Table.Concrete grammar is as follows:First compare first and second
Sample, their c2,c3Value it is different;Then second and the 3rd sample, their c are compared1,c3Value it is different;Then is compared
Three and the 4th sample, their c2,c3Value it is different;By that analogy, finally find, without two adjacent samples scheme is met
In condition, therefore, attribute c4It is not Core Attributes of Decision Table.
S3, deletes last column data, turns upper step2. to attribute c3Detected.It can be found that:Comparing the 4th
When sample and the 5th sample, their c is found1,c2Property value identical (being 2 and 3), c3Property value difference (respectively 1 He
2), corresponding decision value is also different (respectively 3 and 0), therefore attribute c3It is Core Attributes of Decision Table.R={ c3, turn following steps.
S4, first by attribute c3Corresponding row are put into first row, then according to ascending sort, obtains the data set such as institute of table 3
Show.
Data set after the sequence of table 3
According to the algorithm of the present invention, adjacent sample is detected, due to R={ c3, therefore, detailed process is as follows:Order
Gr={ x1 }, first and second sample compare, it is known that their c3Value is identical, is consequently belonging to same particle, gr=
{x1,x2};Next second sample and the 3rd sample are compared, their c3Value is differed, therefore, gr is sentenced
Disconnected, discovery has two decision values (0 and 3), so { x1, x2 } is coarse particles, makes gr={ x3 };Next, comparing the 3rd
With the 4th sample, their c3It is worth identical, gr={ x3, x4 }, next, comparing the 4th with the 5th sample, their c3
It is worth also identical, gr={ x3, x4, x5 }, next, comparing the 5th with the 6th sample, c3Value is differed, therefore, gr is entered
Row judges that discovery has multiple decision values, so { x3, x4, x5 } is coarse particles, makes gr={ x6 };Next, comparing the 6th
With the 7th sample, c3Value is differed, therefore, gr is judged, find only one of which decision value, sample x6 belongs to positive domain,
Make gr={ x7 };Next, due to without adjacent sample, therefore directly judge the decision value of gr, discovery only one of which decision value,
X7 belongs to positive domain;I.e. positive domain is { x6, x7 }. delete the data set obtained after positive domain as shown in table 4.
Table 4 is deleted after the positive domains of R
S5, is unsatisfactory for termination condition, redirects S2.
S2 is performed, c is judged2, by comparing adjacent sample it is found that the c of the 3rd and the 4th sample3,c1Value phase
Together, c2Value is different, and decision value is also different, therefore, attribute c2For Core Attributes of Decision Table, R={ c3, c2 }. turn step4.
S4 is performed, by c2Place column data moves to first row, according to ascending sort, as shown in table 5.
Table 5
Next calculate with regard to { c3, c2Positive domain, obtain positive domain for { x1, x2, x3, x4, x5 }, after deleting positive domain, number
It is sky according to collection;
S5 is performed, because data set is sky, algorithm terminates, and exports final result R={ c3, c2}。
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, related experiment shows, this
Bright computation complexity is approximately O (| U | | C | | R |). based on nine standards UCI data set (Dermatology, Backup_
Large.test, Breast-cancer-wisconsin, Tic-tac-toe, Kr_vs_kp, Mushroom, Ticdata2000,
Letter-recognition, Shuttle_all) test show, this method calculate the time for traditional method 0.09%
To between 12.8%.It is compared as follows shown in each table with the calculating time of other Algorithm for Reduction of part.
Table 6 is contrasted with the calculating speed of algorithm FSPA-PR
Table 1 shows that the calculating time of the inventive method is between the 0.12%to 17.6% of FSPA-PR.Contrast algorithm
FSPA-PR comes from:Y.H.Qian,J.Y.Liang,W.Pedrycz,C.Y.Dang,Positive approximation:An
accelerator for attribute reduction in rough set theory,Artificial
Intelligence 174(2010)597–618.
Table 7 is contrasted with the calculating speed of ADM and OADM
Table 2 shows that calculating between the 0.03%to 1.09% that the time is ADM for the inventive method, is the 3.18% of OADM
Between to 27.79%.Contrast the algorithm ADM and OADM of table 2 is derived from:Z.Meng,Z.Shi,A fast approach to
attribute reduction in incomplete decision systems with tolerance relation-
based rough sets,Information Sciences 179(16)(2009)2774-2793.
Table 8 is contrasted with the calculating speed of Q-ARA
Table 3 shows that the calculating time of the inventive method is less than the 14.56% of Q-ARA.Contrast algorithm Q-ARA is derived from:
M.Li,C.X.Shang,S.Z.Feng,et al.Quick attribute reduction in inconsistent
decision tables,Information Sciences 254(2014)155-180.
The run time required for algorithm is contrasted in above-mentioned experimental result both from list of references itself, the fortune of this algorithm
The row time is then using computer (Inter (R) the CPU G645,2.9GHz and 1.81GB of performance similar to contrast algorithm
Memory) calculating acquisition is carried out.Software platform is matlab.Comparing result shows that the inventive method calculating speed is very fast.
Finally, the present processes are only preferably embodiment, are not intended to limit protection scope of the present invention.It is all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc. should be included in the protection of the present invention
Within the scope of.
Claims (7)
1. a kind of decision table Data Reduction method, it is characterised in that include:
Step 1, judges that whether last conditional attribute is Core Attributes of Decision Table in decision table data set, if it is belongs to this
Property data add yojan collection R, execution step 2;Last string conditional attribute data are otherwise deleted, step 1 is re-executed;
Step 2, by the corresponding column data of the Core Attributes of Decision Table first row is put into, it is determined that meeting after termination condition, output is about
Letter collection R, otherwise return to step 1.
2. the method for claim 1, it is characterised in that also include before the step 1:Delete weight in decision table data set
Duplicate sample sheet;As decision table data set is inconsistent, then the decision value of all samples in decision table data set is changed into into d0, the d0Than
Existing maximum decision value is big by 1.
3. the method for claim 1, it is characterised in that also include before the step 1:Yojan collection is set
4. the method for claim 1, it is characterised in that last in decision table data set is judged in the step 1
Whether conditional attribute is Core Attributes of Decision Table, including:Decision table data set is ranked up according to ascending order, such as two adjacent samples
xi,xi+1Meet condition:cn(xi) < cn(xi+1),d(xi)≠d(xi+1) and other property values are all equal, then last condition
Attribute is Core Attributes of Decision Table, and wherein c is conditional attribute collection, and d is decision kind set.
5. the method for claim 1, it is characterised in that by the corresponding row of the Core Attributes of Decision Table in the step 2
Data are put into first row, also include afterwards:Delete the positive domain of yojan collection R.
6. method as claimed in claim 1 or 2, it is characterised in that termination condition is in the step 2:Decision table data set
For all d of decision value of empty or data left0。
7. method as claimed in claim 5, it is characterised in that the positive domain of the deletion yojan collection R includes:To letter collection R by liter
Sequence sorts, to any two adjacent sample xi,xi+1Detected as follows, if R is (xi)≠R(xi+1), then judge xiPlace particle
Whether decision value number is 1, if then deleting the particle [xi]R。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610996984.2A CN106599049B (en) | 2016-11-09 | 2016-11-09 | A kind of decision table Data Reduction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610996984.2A CN106599049B (en) | 2016-11-09 | 2016-11-09 | A kind of decision table Data Reduction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599049A true CN106599049A (en) | 2017-04-26 |
CN106599049B CN106599049B (en) | 2019-08-27 |
Family
ID=58590290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610996984.2A Active CN106599049B (en) | 2016-11-09 | 2016-11-09 | A kind of decision table Data Reduction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599049B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463781A (en) * | 2017-08-10 | 2017-12-12 | 中南大学 | A kind of Data Reduction processing method and processing device for blast furnace molten iron silicon content forecast model |
CN109657916A (en) * | 2018-11-19 | 2019-04-19 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of Fire risk assessment method, device and server |
CN109978715A (en) * | 2017-12-28 | 2019-07-05 | 北京南瑞电研华源电力技术有限公司 | User side distributed generation resource Data Reduction method and device |
CN109992587A (en) * | 2019-04-09 | 2019-07-09 | 中南大学 | Blast furnace molten iron silicon content based on big data forecasts determinant attribute decision method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030215656A1 (en) * | 1998-12-30 | 2003-11-20 | Phillips Petroleum Company | Process for minimizing haze of extruded clear blends containing styrene/butadiene polymer and product thereof |
CN102262682A (en) * | 2011-08-19 | 2011-11-30 | 上海应用技术学院 | Rapid attribute reduction method based on rough classification knowledge discovery |
-
2016
- 2016-11-09 CN CN201610996984.2A patent/CN106599049B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030215656A1 (en) * | 1998-12-30 | 2003-11-20 | Phillips Petroleum Company | Process for minimizing haze of extruded clear blends containing styrene/butadiene polymer and product thereof |
CN102262682A (en) * | 2011-08-19 | 2011-11-30 | 上海应用技术学院 | Rapid attribute reduction method based on rough classification knowledge discovery |
Non-Patent Citations (2)
Title |
---|
冯卫兵,张梅: "基于核与改进的条件区分能力的反向删除属性约简算法", 《计算机应用与软件》 * |
黄国顺: "基于数据库系统的决策表核与属性约简算法", 《计算机应用》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463781A (en) * | 2017-08-10 | 2017-12-12 | 中南大学 | A kind of Data Reduction processing method and processing device for blast furnace molten iron silicon content forecast model |
CN107463781B (en) * | 2017-08-10 | 2020-02-21 | 中南大学 | Data reduction processing method and device for blast furnace molten iron silicon content prediction model |
CN109978715A (en) * | 2017-12-28 | 2019-07-05 | 北京南瑞电研华源电力技术有限公司 | User side distributed generation resource Data Reduction method and device |
CN109657916A (en) * | 2018-11-19 | 2019-04-19 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of Fire risk assessment method, device and server |
CN109992587A (en) * | 2019-04-09 | 2019-07-09 | 中南大学 | Blast furnace molten iron silicon content based on big data forecasts determinant attribute decision method |
CN109992587B (en) * | 2019-04-09 | 2021-04-13 | 中南大学 | Blast furnace molten iron silicon content prediction key attribute judgment method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN106599049B (en) | 2019-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111385602B (en) | Video auditing method, medium and computer equipment based on multi-level and multi-model | |
CN106599049A (en) | Decision table data reduction method | |
CN102629305B (en) | Feature selection method facing to SNP (Single Nucleotide Polymorphism) data | |
Nam et al. | Efficient approach for damped window-based high utility pattern mining with list structure | |
CN107506865A (en) | A kind of load forecasting method and system based on LSSVM optimizations | |
Warnars | Mining patterns with attribute oriented induction | |
CN109614520B (en) | Parallel acceleration method for multi-pattern graph matching | |
CN114186862A (en) | Entropy weight TOPSIS model-based double-layer energy performance evaluation system | |
Xiang et al. | Double-branch fusion network with a parallel attention selection mechanism for camouflaged object detection | |
CN109062867A (en) | Object and attribute while increased matrix Dynamic Attribute Reduction method | |
CN111368865B (en) | Remote sensing image oil storage tank detection method and device, readable storage medium and equipment | |
CN103778220A (en) | Decision support method and device based on cloud computing | |
CN102184210A (en) | Stratified decision tree constructing method | |
CN117407533A (en) | Method and device for constructing knowledge graph, and computer readable storage medium | |
CN112396205A (en) | Method, equipment and system for optimizing complex dispersed fault block oilfield group movement sequence | |
CN111598239B (en) | Method and device for extracting process system of article based on graph neural network | |
CN111737985B (en) | Method and device for extracting process system from article title hierarchical structure | |
Bo et al. | Detecting dense subgraphs in complex networks based on edge density coefficient | |
CN112183819A (en) | Investment effectiveness key factor optimization method and system based on power transmission and distribution price supervision | |
Alharbi et al. | Rough topologies on classical and based covering rough sets with applications in making decisions on chronic thromboembolic pulmonary hypertension | |
CN110211132A (en) | Point cloud semantic segmentation innovatory algorithm based on slice network | |
Chawla et al. | Implementation of association rule mining using reverse apriori algorithmic approach | |
CN114996256B (en) | Data cleaning method based on class balance | |
Doreswamy et al. | Similarity measuring approach for engineering materials selection | |
Lv et al. | Text information retrieval based on concept semantic similarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |