[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Quantifiable data mining using ratio rules

Published: 01 February 2000 Publication History

Abstract

Association Rule Mining algorithms operate on a data matrix (e.g., customers $\times$ products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess” the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting, answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than using straightforward column averages.

Cited By

View all
  • (2015)Image processing-aided working posture analysisComputers and Industrial Engineering10.1016/j.cie.2015.03.01185:C(384-394)Online publication date: 1-Jul-2015
  • (2012)Diverse dimension decomposition for itemset spacesKnowledge and Information Systems10.1007/s10115-012-0518-533:2(447-473)Online publication date: 1-Nov-2012
  • (2011)Online updating the generalized inverse of centered matricesProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence10.5555/2900423.2900715(1826-1827)Online publication date: 7-Aug-2011
  • Show More Cited By

Index Terms

  1. Quantifiable data mining using ratio rules

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image The VLDB Journal — The International Journal on Very Large Data Bases
      The VLDB Journal — The International Journal on Very Large Data Bases  Volume 8, Issue 3-4
      February 2000
      184 pages

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 February 2000

      Author Tags

      1. Data mining
      2. Forecasting
      3. Guessing error
      4. Knowledge discovery

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)37
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Image processing-aided working posture analysisComputers and Industrial Engineering10.1016/j.cie.2015.03.01185:C(384-394)Online publication date: 1-Jul-2015
      • (2012)Diverse dimension decomposition for itemset spacesKnowledge and Information Systems10.1007/s10115-012-0518-533:2(447-473)Online publication date: 1-Nov-2012
      • (2011)Online updating the generalized inverse of centered matricesProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence10.5555/2900423.2900715(1826-1827)Online publication date: 7-Aug-2011
      • (2011)Are tensor decomposition solutions unique? on the Global convergence HOSVD and parafac algorithmsProceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I10.5555/2017863.2017878(148-159)Online publication date: 24-May-2011
      • (2010)Re-mining positive and negative association mining resultsProceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects10.5555/1880672.1880682(101-114)Online publication date: 12-Jul-2010
      • (2009)Measures of ruleset quality for general rules extraction methodsInternational Journal of Approximate Reasoning10.1016/j.ijar.2009.03.00250:6(867-879)Online publication date: 1-Jun-2009
      • (2008)Incremental tensor analysisACM Transactions on Knowledge Discovery from Data10.1145/1409620.14096212:3(1-37)Online publication date: 27-Oct-2008
      • (2007)Dissemination of compressed historical information in sensor networksThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-005-0173-516:4(439-461)Online publication date: 1-Oct-2007
      • (2007)Measures of Ruleset Quality Capable to Represent Uncertain ValidityProceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty10.1007/978-3-540-75256-1_39(430-442)Online publication date: 31-Oct-2007
      • (2006)Beyond streams and graphsProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150445(374-383)Online publication date: 20-Aug-2006
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media