[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106599049A - Decision table data reduction method - Google Patents

Decision table data reduction method Download PDF

Info

Publication number
CN106599049A
CN106599049A CN201610996984.2A CN201610996984A CN106599049A CN 106599049 A CN106599049 A CN 106599049A CN 201610996984 A CN201610996984 A CN 201610996984A CN 106599049 A CN106599049 A CN 106599049A
Authority
CN
China
Prior art keywords
decision table
decision
attribute
table data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610996984.2A
Other languages
Chinese (zh)
Other versions
CN106599049B (en
Inventor
尹林子
许雪梅
丁家峰
蒋昭辉
李乐
李靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201610996984.2A priority Critical patent/CN106599049B/en
Publication of CN106599049A publication Critical patent/CN106599049A/en
Application granted granted Critical
Publication of CN106599049B publication Critical patent/CN106599049B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a decision table data reduction method. The method comprises the following steps: step one, judging whether the last condition attribute in a decision table data set is a decision table core attribute, adding the data of the attribute in a reduction set R if the last condition attribute is the decision table core attribute, and executing the step 2; deleting the last line of condition attribute data if the last condition attribute is not the decision table core attribute, and re-executing the step one; and step two, placing the line data corresponding to the decision table core attribute in the first line, and outputting the reduction set R after determining that an end condition is met, or else returning to step one. The method provided by the invention has the advantage of simply and effectively performing reduction on the decision table data.

Description

A kind of decision table Data Reduction method
Technical field
The present invention relates to technical field of data processing, more particularly, to a kind of decision table Data Reduction method.
Background technology
At present, as data acquisition, the fast development of memory technology, the problem of data redundancy are increasingly severe, it is not only Memory space is greatly wasted, the performance based on the modeling of data, decision-making scheduling algorithm can be also significantly reduced.Rough set theory is one Kind special yojan data, from the theory of extracting data effective information.The theoretical core is Data Reduction, by weighing Data want, redundancy and attribute are deleted, and are based on number so as to obtain a new data set simplified comprising complete information According to analysis, modeling, decision-making etc. the source data of high-quality is provided.
Traditional Data Reduction method is frequently with the heuristic reduction structure based on Attribute Significance.Its scheme is stated such as Under:Step 1, data set pretreatment, and calculate Core Attributes of Decision Table collection;Step 2, calculates the importance degree of each attribute;Step 3, Select the attribute with maximum importance degree;Step 4, based on all attribute modification data sets selected;Step 5, judges whether Meet algorithm termination condition, the property set selected is exported if meeting, otherwise jump to step 2.Traditional heuristic reduction side The characteristics of method is to need computation attribute importance degree and whole Core Attributes of Decision Table collection.Especially the definition of Attribute Significance with Calculating has attracted the attention of Many researchers, and achieves substantial amounts of achievement.
However, this traditional heuristic reduction structure comes with some shortcomings, it is mainly manifested in:First, importance degree is calculated Number of times is too many, step 2 can be performed a number of times, and most of attribute will repeatedly be calculated importance degree, if step 4 adopts addition Pattern, then importance degree needs to calculate (2 | C |-| R |+1) * | R |/2 time, if step 4 takes subtraction mode, importance degree to need Calculating (| C |+| R |+1 |) * (| C |-| R |)/2 times, therefore, no matter whether the computing formula of Attribute Significance is simple, it is required for wave Take the substantial amounts of time;Second, based on the randomness open-ended question of Attribute Significance, existing Attribute Significance computational methods have Multiple attributes with maximum importance degree may be produced, existing solution is usually randomly choosed in step 3, this will Result and nicety of grading to attribute reduction produces an impact for being difficult to predict.
The content of the invention
The present invention is to overcome the problems referred to above or solve the above problems at least in part, there is provided a kind of simpler heuristic Yojan structure, for theories integration and implementation method that the design of high speed Algorithm for Reduction provides structural level.
According to an aspect of the present invention, there is provided a kind of decision table Data Reduction method, including:Step 1, judges decision table Whether last conditional attribute is Core Attributes of Decision Table in data set, if it is the data of the attribute is added into yojan collection R, Execution step 2;Last string conditional attribute data are otherwise deleted, step 1 is re-executed;Step 2, by the Core Attributes of Decision Table Corresponding column data is put into first row, it is determined that meeting after termination condition, exports yojan collection R, otherwise return to step 1.
The application proposes a kind of decision table Data Reduction method, and traditional attribute is replaced using Core Attributes of Decision Table judgement Importance degree is calculated;Efficient Core Attributes of Decision Table evaluation algorithm and positive domain computational algorithm are built using ordering techniques;Each category Property at most calculate once, or retaining, or abandoning;The column data of redundancy can be constantly deleted in initial process, after reduction The time of continuous initial process and space complexity.Have the advantages that:1st, instant invention overcomes traditional based on attribute The defect of importance degree yojan structure.Show:Traditional Attribute Significance concept is abandoned, it is not necessary to which design importance degree calculates public Formula, the stochastic problems that also there is no attribute inspiration, result of calculation is objective, favorable repeatability;2nd, method of the present invention structure letter It is single.Show:Each attribute is at most calculated once, and the attribute of traditional method needs repeatedly to calculate;Secondly, the present invention need not Whole Core Attributes of Decision Table collection was calculated before inspiration;3rd, the method for the present invention is related only to sort and compares operation, calculates letter It is single, not only it is easily achieved on unit, it also is adapted for being run on big data platform;4th, method of the present invention calculating speed is fast.It is logical The fast algorithm recommended using the present invention is crossed, a complete yojan can be quickly calculated.
Description of the drawings
Fig. 1 is according to a kind of schematic flow sheet of decision table Data Reduction method of the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Hereinafter implement Example is not limited to the scope of the present invention for illustrating the present invention.
Present invention is generally directed to decision table data are processed, S=< U, At, { V are expressed asa|a∈At},{Ia|a∈At} >, wherein, U is the set of all samples, At=C ∪ D, C={ c1,c2,…,cnIt is referred to as conditional attribute collection, D={ d } is referred to as certainly Plan property set.
Fig. 1, in one particular embodiment of the present invention, illustrates a kind of overall flow of decision table Data Reduction method. On the whole, including:Step 1, judges whether last conditional attribute is Core Attributes of Decision Table in decision table data set, if It is then the data of the attribute to be added into yojan collection R, execution step 2;Last string conditional attribute data are otherwise deleted, is re-executed Step 1;Step 2, by the corresponding column data of the Core Attributes of Decision Table first row is put into, it is determined that meeting after termination condition, is exported Yojan collection R.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1:Delete Repeated sample in decision table data set;Such as decision table data set is inconsistent, then by the decision-making of all samples in decision table data set Value is changed into d0, the d0It is bigger by 1 than existing maximum decision value:d0=max (d (x))+1.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1:To data Collection carries out ascending sort;Then to any two adjacent sample xi,xi+1
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, also includes before step 1:Arrange Yojan collection
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, judges decision table number in step 1 According to concentration, whether last conditional attribute is Core Attributes of Decision Table, including:Decision table data set is ranked up according to ascending order, Such as two adjacent sample xi,xi+1Meet condition:cn(xi) < cn(xi+1),d(xi)≠d(xi+1) and other property values are all equal, Then last conditional attribute is Core Attributes of Decision Table, and wherein c is conditional attribute collection, and d is decision kind set.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, described will determine in the step 2 The corresponding column data of plan table core attributes is put into first row, also includes afterwards:Delete the positive domain of yojan collection R.
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, termination condition in the step 2 For:Decision table data set is all d of decision value of empty or data left0
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, the deletion yojan collection R is just Domain includes:Letter collection R is sorted in ascending order, to any two adjacent sample xi,xi+1Detected as follows, if R is (xi)≠R(xi+1), Then judge xiWhether the decision value number of place particle is 1, if then deleting the particle [xi]R
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, original decision table data set is such as Shown in table 1.
The original decision table data set of 1 one, table
S1, is arrangedBy data set according to ascending sort, if matlab environment, then using sortrows functions It is ranked up, then according to the algorithm recommended checks adjacent sample, obtains new decision table data set as shown in table 2.
Data set after the pretreatment of table 2
Wherein, the decision value (the namely value of last string) of all of inconsistent sample is all changed to " 4 ".
S2, Rule of judgment attribute c4Whether it is Core Attributes of Decision Table.Concrete grammar is as follows:First compare first and second Sample, their c2,c3Value it is different;Then second and the 3rd sample, their c are compared1,c3Value it is different;Then is compared Three and the 4th sample, their c2,c3Value it is different;By that analogy, finally find, without two adjacent samples scheme is met In condition, therefore, attribute c4It is not Core Attributes of Decision Table.
S3, deletes last column data, turns upper step2. to attribute c3Detected.It can be found that:Comparing the 4th When sample and the 5th sample, their c is found1,c2Property value identical (being 2 and 3), c3Property value difference (respectively 1 He 2), corresponding decision value is also different (respectively 3 and 0), therefore attribute c3It is Core Attributes of Decision Table.R={ c3, turn following steps.
S4, first by attribute c3Corresponding row are put into first row, then according to ascending sort, obtains the data set such as institute of table 3 Show.
Data set after the sequence of table 3
According to the algorithm of the present invention, adjacent sample is detected, due to R={ c3, therefore, detailed process is as follows:Order Gr={ x1 }, first and second sample compare, it is known that their c3Value is identical, is consequently belonging to same particle, gr= {x1,x2};Next second sample and the 3rd sample are compared, their c3Value is differed, therefore, gr is sentenced Disconnected, discovery has two decision values (0 and 3), so { x1, x2 } is coarse particles, makes gr={ x3 };Next, comparing the 3rd With the 4th sample, their c3It is worth identical, gr={ x3, x4 }, next, comparing the 4th with the 5th sample, their c3 It is worth also identical, gr={ x3, x4, x5 }, next, comparing the 5th with the 6th sample, c3Value is differed, therefore, gr is entered Row judges that discovery has multiple decision values, so { x3, x4, x5 } is coarse particles, makes gr={ x6 };Next, comparing the 6th With the 7th sample, c3Value is differed, therefore, gr is judged, find only one of which decision value, sample x6 belongs to positive domain, Make gr={ x7 };Next, due to without adjacent sample, therefore directly judge the decision value of gr, discovery only one of which decision value, X7 belongs to positive domain;I.e. positive domain is { x6, x7 }. delete the data set obtained after positive domain as shown in table 4.
Table 4 is deleted after the positive domains of R
S5, is unsatisfactory for termination condition, redirects S2.
S2 is performed, c is judged2, by comparing adjacent sample it is found that the c of the 3rd and the 4th sample3,c1Value phase Together, c2Value is different, and decision value is also different, therefore, attribute c2For Core Attributes of Decision Table, R={ c3, c2 }. turn step4.
S4 is performed, by c2Place column data moves to first row, according to ascending sort, as shown in table 5.
Table 5
Next calculate with regard to { c3, c2Positive domain, obtain positive domain for { x1, x2, x3, x4, x5 }, after deleting positive domain, number It is sky according to collection;
S5 is performed, because data set is sky, algorithm terminates, and exports final result R={ c3, c2}。
In one particular embodiment of the present invention, a kind of decision table Data Reduction method, related experiment shows, this Bright computation complexity is approximately O (| U | | C | | R |). based on nine standards UCI data set (Dermatology, Backup_ Large.test, Breast-cancer-wisconsin, Tic-tac-toe, Kr_vs_kp, Mushroom, Ticdata2000, Letter-recognition, Shuttle_all) test show, this method calculate the time for traditional method 0.09% To between 12.8%.It is compared as follows shown in each table with the calculating time of other Algorithm for Reduction of part.
Table 6 is contrasted with the calculating speed of algorithm FSPA-PR
Table 1 shows that the calculating time of the inventive method is between the 0.12%to 17.6% of FSPA-PR.Contrast algorithm FSPA-PR comes from:Y.H.Qian,J.Y.Liang,W.Pedrycz,C.Y.Dang,Positive approximation:An accelerator for attribute reduction in rough set theory,Artificial Intelligence 174(2010)597–618.
Table 7 is contrasted with the calculating speed of ADM and OADM
Table 2 shows that calculating between the 0.03%to 1.09% that the time is ADM for the inventive method, is the 3.18% of OADM Between to 27.79%.Contrast the algorithm ADM and OADM of table 2 is derived from:Z.Meng,Z.Shi,A fast approach to attribute reduction in incomplete decision systems with tolerance relation- based rough sets,Information Sciences 179(16)(2009)2774-2793.
Table 8 is contrasted with the calculating speed of Q-ARA
Table 3 shows that the calculating time of the inventive method is less than the 14.56% of Q-ARA.Contrast algorithm Q-ARA is derived from: M.Li,C.X.Shang,S.Z.Feng,et al.Quick attribute reduction in inconsistent decision tables,Information Sciences 254(2014)155-180.
The run time required for algorithm is contrasted in above-mentioned experimental result both from list of references itself, the fortune of this algorithm The row time is then using computer (Inter (R) the CPU G645,2.9GHz and 1.81GB of performance similar to contrast algorithm Memory) calculating acquisition is carried out.Software platform is matlab.Comparing result shows that the inventive method calculating speed is very fast.
Finally, the present processes are only preferably embodiment, are not intended to limit protection scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc. should be included in the protection of the present invention Within the scope of.

Claims (7)

1. a kind of decision table Data Reduction method, it is characterised in that include:
Step 1, judges that whether last conditional attribute is Core Attributes of Decision Table in decision table data set, if it is belongs to this Property data add yojan collection R, execution step 2;Last string conditional attribute data are otherwise deleted, step 1 is re-executed;
Step 2, by the corresponding column data of the Core Attributes of Decision Table first row is put into, it is determined that meeting after termination condition, output is about Letter collection R, otherwise return to step 1.
2. the method for claim 1, it is characterised in that also include before the step 1:Delete weight in decision table data set Duplicate sample sheet;As decision table data set is inconsistent, then the decision value of all samples in decision table data set is changed into into d0, the d0Than Existing maximum decision value is big by 1.
3. the method for claim 1, it is characterised in that also include before the step 1:Yojan collection is set
4. the method for claim 1, it is characterised in that last in decision table data set is judged in the step 1 Whether conditional attribute is Core Attributes of Decision Table, including:Decision table data set is ranked up according to ascending order, such as two adjacent samples xi,xi+1Meet condition:cn(xi) < cn(xi+1),d(xi)≠d(xi+1) and other property values are all equal, then last condition Attribute is Core Attributes of Decision Table, and wherein c is conditional attribute collection, and d is decision kind set.
5. the method for claim 1, it is characterised in that by the corresponding row of the Core Attributes of Decision Table in the step 2 Data are put into first row, also include afterwards:Delete the positive domain of yojan collection R.
6. method as claimed in claim 1 or 2, it is characterised in that termination condition is in the step 2:Decision table data set For all d of decision value of empty or data left0
7. method as claimed in claim 5, it is characterised in that the positive domain of the deletion yojan collection R includes:To letter collection R by liter Sequence sorts, to any two adjacent sample xi,xi+1Detected as follows, if R is (xi)≠R(xi+1), then judge xiPlace particle Whether decision value number is 1, if then deleting the particle [xi]R
CN201610996984.2A 2016-11-09 2016-11-09 A kind of decision table Data Reduction method Active CN106599049B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610996984.2A CN106599049B (en) 2016-11-09 2016-11-09 A kind of decision table Data Reduction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610996984.2A CN106599049B (en) 2016-11-09 2016-11-09 A kind of decision table Data Reduction method

Publications (2)

Publication Number Publication Date
CN106599049A true CN106599049A (en) 2017-04-26
CN106599049B CN106599049B (en) 2019-08-27

Family

ID=58590290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610996984.2A Active CN106599049B (en) 2016-11-09 2016-11-09 A kind of decision table Data Reduction method

Country Status (1)

Country Link
CN (1) CN106599049B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463781A (en) * 2017-08-10 2017-12-12 中南大学 A kind of Data Reduction processing method and processing device for blast furnace molten iron silicon content forecast model
CN109657916A (en) * 2018-11-19 2019-04-19 深圳市中电数通智慧安全科技股份有限公司 A kind of Fire risk assessment method, device and server
CN109978715A (en) * 2017-12-28 2019-07-05 北京南瑞电研华源电力技术有限公司 User side distributed generation resource Data Reduction method and device
CN109992587A (en) * 2019-04-09 2019-07-09 中南大学 Blast furnace molten iron silicon content based on big data forecasts determinant attribute decision method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030215656A1 (en) * 1998-12-30 2003-11-20 Phillips Petroleum Company Process for minimizing haze of extruded clear blends containing styrene/butadiene polymer and product thereof
CN102262682A (en) * 2011-08-19 2011-11-30 上海应用技术学院 Rapid attribute reduction method based on rough classification knowledge discovery

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030215656A1 (en) * 1998-12-30 2003-11-20 Phillips Petroleum Company Process for minimizing haze of extruded clear blends containing styrene/butadiene polymer and product thereof
CN102262682A (en) * 2011-08-19 2011-11-30 上海应用技术学院 Rapid attribute reduction method based on rough classification knowledge discovery

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯卫兵,张梅: "基于核与改进的条件区分能力的反向删除属性约简算法", 《计算机应用与软件》 *
黄国顺: "基于数据库系统的决策表核与属性约简算法", 《计算机应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463781A (en) * 2017-08-10 2017-12-12 中南大学 A kind of Data Reduction processing method and processing device for blast furnace molten iron silicon content forecast model
CN107463781B (en) * 2017-08-10 2020-02-21 中南大学 Data reduction processing method and device for blast furnace molten iron silicon content prediction model
CN109978715A (en) * 2017-12-28 2019-07-05 北京南瑞电研华源电力技术有限公司 User side distributed generation resource Data Reduction method and device
CN109657916A (en) * 2018-11-19 2019-04-19 深圳市中电数通智慧安全科技股份有限公司 A kind of Fire risk assessment method, device and server
CN109992587A (en) * 2019-04-09 2019-07-09 中南大学 Blast furnace molten iron silicon content based on big data forecasts determinant attribute decision method
CN109992587B (en) * 2019-04-09 2021-04-13 中南大学 Blast furnace molten iron silicon content prediction key attribute judgment method based on big data

Also Published As

Publication number Publication date
CN106599049B (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN111385602B (en) Video auditing method, medium and computer equipment based on multi-level and multi-model
CN106599049A (en) Decision table data reduction method
CN102629305B (en) Feature selection method facing to SNP (Single Nucleotide Polymorphism) data
Nam et al. Efficient approach for damped window-based high utility pattern mining with list structure
CN107506865A (en) A kind of load forecasting method and system based on LSSVM optimizations
Warnars Mining patterns with attribute oriented induction
CN109614520B (en) Parallel acceleration method for multi-pattern graph matching
CN114186862A (en) Entropy weight TOPSIS model-based double-layer energy performance evaluation system
Xiang et al. Double-branch fusion network with a parallel attention selection mechanism for camouflaged object detection
CN109062867A (en) Object and attribute while increased matrix Dynamic Attribute Reduction method
CN111368865B (en) Remote sensing image oil storage tank detection method and device, readable storage medium and equipment
CN103778220A (en) Decision support method and device based on cloud computing
CN102184210A (en) Stratified decision tree constructing method
CN117407533A (en) Method and device for constructing knowledge graph, and computer readable storage medium
CN112396205A (en) Method, equipment and system for optimizing complex dispersed fault block oilfield group movement sequence
CN111598239B (en) Method and device for extracting process system of article based on graph neural network
CN111737985B (en) Method and device for extracting process system from article title hierarchical structure
Bo et al. Detecting dense subgraphs in complex networks based on edge density coefficient
CN112183819A (en) Investment effectiveness key factor optimization method and system based on power transmission and distribution price supervision
Alharbi et al. Rough topologies on classical and based covering rough sets with applications in making decisions on chronic thromboembolic pulmonary hypertension
CN110211132A (en) Point cloud semantic segmentation innovatory algorithm based on slice network
Chawla et al. Implementation of association rule mining using reverse apriori algorithmic approach
CN114996256B (en) Data cleaning method based on class balance
Doreswamy et al. Similarity measuring approach for engineering materials selection
Lv et al. Text information retrieval based on concept semantic similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant