More Web Proxy on the site http://driver.im/

Article

Sampling-based sequential subgroup mining

Author:

Martin ScholzAuthors Info & Claims

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 265 - 274

https://doi.org/10.1145/1081870.1081902

Published: 21 August 2005 Publication History

Abstract

Subgroup discovery is a learning task that aims at finding interesting rules from classified examples. The search is guided by a utility function, trading off the coverage of rules against their statistical unusualness. One shortcoming of existing approaches is that they do not incorporate prior knowledge. To this end a novel generic sampling strategy is proposed. It allows to turn pattern mining into an iterative process. In each iteration the focus of subgroup discovery lies on those patterns that are unexpected with respect to prior knowledge and previously discovered patterns. The result of this technique is a small diverse set of understandable rules that characterise a specified property of interest. As another contribution this article derives a simple connection between subgroup discovery and classifier induction. For a popular utility function this connection allows to apply any standard rule induction algorithm to the task of subgroup discovery after a step of stratified resampling. The proposed techniques are empirically compared to state of the art subgroup discovery algorithms.

References

[1]

C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.]]

[2]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.]]

Digital Library

[3]

S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '97), pages 255--264, Tucson, AZ., 1997. ACM.]]

Digital Library

[4]

P. Cunningham and J. Carney. Diversity versus Quality in Classification Ensembles Based on Feature Selection. In Proceedings of the 11th European Conference on Machine Learning (ECML 2000), pages 109 -- 116. Springer Verlag Berlin, Barcelona, Spain, 2000.]]

Digital Library

[5]

T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers, 2004. Submitted to Machine Learning.]]

[6]

P. A. Flach. The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). Morgen Kaufman, 2003.]]

Digital Library

[7]

Y. Freund and R. R. Schapire. A decision--theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997.]]

Digital Library

[8]

J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, (28):337--374, 2000.]]

[9]

J. Fürnkranz and P. Flach. ROC 'n' Rule Learning -- Towards a Better Understanding of Covering Algorithms. Machine Learning, 58(1):39--77, 2005.]]

Digital Library

[10]

S. Jaroszewicz and D. A. Simovici. Interestingness of Frequent Itemsets Using Bayesian Networks as Background Knowledge. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD-2004). AAAI Press, August 2004.]]

Digital Library

[11]

G. H. John and P. Langley. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338--345. Morgan Kaufmann, 1995.]]

Digital Library

[12]

W. Klösgen. Explora: A Multipattern and Multistrategy Discovery Assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 3, pages 249--272. AAAI Press/The MIT Press, Menlo Park, California, 1996.]]

Digital Library

[13]

N. Lavrac, P. Flach, and B. Zupan. Rule Evaluation Measures: A Unifying View. In 9th International Workshop on Inductive Logic Programming, Lecture Notes in Computer Science. Springer, 1999.]]

Digital Library

[14]

N. Lavrac, B. Kavsek, P. Flach, and L. Todorovski. Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5:153--188, Feb 2004.]]

Digital Library

[15]

N. Lavrac, F. Zelezny, and P. Flach. RSD: Relational subgroup discovery through first-order feature construction. In 12th International Conference on Inductive Logic Programming. Springer, 2002.]]

Digital Library

[16]

D. Mackay. Introduction To Monte Carlo Methods. In Learning in Graphical Models, pages 175--204. 1998.]]

Digital Library

[17]

I. Mierswa, R. Klinkberg, S. Fischer, and O. Ritthoff. A Flexible Platform for Knowledge Discovery Experiments: YALE -- Yet Another Learning Environment. In LLWA 03 - Tagungsband der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivität, 2003.]]

[18]

T. M. Mitchell. Machine Learning. McGraw Hill, New York, 1997.]]

Digital Library

[19]

R. E. Schapire. The Strength of Weak Learnability. Machine Learning, 5:197--227, 1990.]]

[20]

R. E. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating Prior Knowledge into Boosting. In Proc. of the 19th International Conference on Machine Learning (ICML-02), 2002.]]

Digital Library

[21]

R. E. Schapire and Y. Singer. Improved Boosting Using Confidence-rated Predictions. Machine Learning, 37(3):297--336, 1999.]]

Digital Library

[22]

T. Scheffer and S. Wrobel. Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. Journal of Machine Learning Research, 3:833--862, 2002.]]

Digital Library

[23]

M. Scholz. Knowledge-Based Sampling for Subgroup Discovery. In K. Morik, J.-F. Boulicaut, and A. Siebes, editors, Proc. of the Workshop on Detecting Local Patterns, Lecture Notes in Computer Science. Springer, 2005. To appear.]]

Digital Library

[24]

A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6):970--974, dec 1996.]]

Digital Library

[25]

E. Suzuki. Discovering Interesting Exception Rules with Rule Pair. In ECML/PKDD 2004 Workshop, Advances in Inductive Rule Learning, 2004.]]

[26]

I. Witten and E. Frank. Data Mining -- Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.]]

Digital Library

[27]

S. Wrobel. An Algorithm for Multi--relational Discovery of Subgroups. In J. Komorowski and J. Zytkow, editors, Principles of Data Mining and Knowledge Discovery: First European Symposium (PKDD 97), pages 78--87, Berlin, New York, 1997. Springer.]]

Digital Library

[28]

X. Wu and R. Srihari. Incorporating Prior Knowledge with Weighted Margin Support Vector Machines. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD-2004). AAAI Press, August 2004.]]

Digital Library

[29]

B. Zadrozny, J. Langford, and A. Naoki. Cost--Sensitive Learning by Cost--Proportionate Example Weighting. In Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM'03), 2003.]]

Digital Library

Cited By

Lucas TSilva TVimieiro RLudermir T(2017)A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional dataApplied Soft Computing10.1016/j.asoc.2017.05.04859:C(487-499)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.asoc.2017.05.048
Palshikar GSahu KSrivastava R(2016)Ensembles of Interesting Subgroups for Discovering High Potential EmployeesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_17(208-220)Online publication date: 19-Apr-2016
https://dl.acm.org/doi/10.1007/978-3-319-31750-2_17
Natu MPalshikar G(2014)Interesting Subset Discovery and Its Application on Service ProcessesData Mining for Service10.1007/978-3-642-45252-9_14(245-269)Online publication date: 4-Jan-2014
https://doi.org/10.1007/978-3-642-45252-9_14
Show More Cited By

Index Terms

Sampling-based sequential subgroup mining
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Analysis of sampling techniques for association rule mining
ICDT '09: Proceedings of the 12th International Conference on Database Theory

In this paper, we present a comprehensive theoretical analysis of the sampling technique for the association rule mining problem. Most of the previous works have concentrated only on the empirical evaluation of the effectiveness of sampling for the step ...
Developing Novel and Effective Approach for Association Rule Mining Using Progressive Sampling
ICCEE '09: Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01

A challenging task in data mining is the process of discovering association rules from a large database. Most of the existing association rule mining algorithms make repeated passes over the entire database to determine the frequent itemsets, which is ...
Mining top-K frequent itemsets through progressive sampling

We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

August 2005

844 pages

ISBN:159593135X

DOI:10.1145/1081870

General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD05

Sponsor:

KDD05: The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2005

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
795
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lucas TSilva TVimieiro RLudermir T(2017)A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional dataApplied Soft Computing10.1016/j.asoc.2017.05.04859:C(487-499)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.asoc.2017.05.048
Palshikar GSahu KSrivastava R(2016)Ensembles of Interesting Subgroups for Discovering High Potential EmployeesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_17(208-220)Online publication date: 19-Apr-2016
https://dl.acm.org/doi/10.1007/978-3-319-31750-2_17
Natu MPalshikar G(2014)Interesting Subset Discovery and Its Application on Service ProcessesData Mining for Service10.1007/978-3-642-45252-9_14(245-269)Online publication date: 4-Jan-2014
https://doi.org/10.1007/978-3-642-45252-9_14
Mavroeidis DMagdalinos P(2012)A Sequential Sampling Framework for Spectral k-Means Based on Efficient Bootstrap Accuracy EstimationsACM Transactions on Knowledge Discovery from Data10.1145/2297456.22974576:2(1-37)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.1145/2297456.2297457
Silva AAntunes C(2012)Semi-supervised clusteringProceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-31537-4_20(252-263)Online publication date: 13-Jul-2012
https://dl.acm.org/doi/10.1007/978-3-642-31537-4_20
Rueping SDanyluk ABottou LLittman M(2009)Ranking interesting subgroupsProceedings of the 26th Annual International Conference on Machine Learning10.1145/1553374.1553491(913-920)Online publication date: 14-Jun-2009
https://dl.acm.org/doi/10.1145/1553374.1553491
Blumenstock ASchweiggert FMüller MLanquillon C(2008)Rule cubes for causal investigationsKnowledge and Information Systems10.1007/s10115-008-0141-718:1(109-132)Online publication date: 17-May-2008
https://doi.org/10.1007/s10115-008-0141-7
Blumenstock ASchweiggert FMuller M(2007)Rule Cubes for Causal InvestigationsProceedings of the 2007 Seventh IEEE International Conference on Data Mining10.1109/ICDM.2007.29(53-62)Online publication date: 28-Oct-2007
https://dl.acm.org/doi/10.1109/ICDM.2007.29
Mierswa IWurst MKlinkenberg RScholz MEuler TEliassi-Rad TUngar LCraven MGunopulos D(2006)YALEProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150531(935-940)Online publication date: 20-Aug-2006
https://dl.acm.org/doi/10.1145/1150402.1150531
Jaroszewicz SEliassi-Rad TUngar LCraven MGunopulos D(2006)Polynomial association rules with applications to logistic regressionProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150472(586-591)Online publication date: 20-Aug-2006
https://dl.acm.org/doi/10.1145/1150402.1150472
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents