[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1081870.1081902acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Sampling-based sequential subgroup mining

Published: 21 August 2005 Publication History

Abstract

Subgroup discovery is a learning task that aims at finding interesting rules from classified examples. The search is guided by a utility function, trading off the coverage of rules against their statistical unusualness. One shortcoming of existing approaches is that they do not incorporate prior knowledge. To this end a novel generic sampling strategy is proposed. It allows to turn pattern mining into an iterative process. In each iteration the focus of subgroup discovery lies on those patterns that are unexpected with respect to prior knowledge and previously discovered patterns. The result of this technique is a small diverse set of understandable rules that characterise a specified property of interest. As another contribution this article derives a simple connection between subgroup discovery and classifier induction. For a popular utility function this connection allows to apply any standard rule induction algorithm to the task of subgroup discovery after a step of stratified resampling. The proposed techniques are empirically compared to state of the art subgroup discovery algorithms.

References

[1]
C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.]]
[2]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.]]
[3]
S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD '97), pages 255--264, Tucson, AZ., 1997. ACM.]]
[4]
P. Cunningham and J. Carney. Diversity versus Quality in Classification Ensembles Based on Feature Selection. In Proceedings of the 11th European Conference on Machine Learning (ECML 2000), pages 109 -- 116. Springer Verlag Berlin, Barcelona, Spain, 2000.]]
[5]
T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers, 2004. Submitted to Machine Learning.]]
[6]
P. A. Flach. The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). Morgen Kaufman, 2003.]]
[7]
Y. Freund and R. R. Schapire. A decision--theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997.]]
[8]
J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, (28):337--374, 2000.]]
[9]
J. Fürnkranz and P. Flach. ROC 'n' Rule Learning -- Towards a Better Understanding of Covering Algorithms. Machine Learning, 58(1):39--77, 2005.]]
[10]
S. Jaroszewicz and D. A. Simovici. Interestingness of Frequent Itemsets Using Bayesian Networks as Background Knowledge. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD-2004). AAAI Press, August 2004.]]
[11]
G. H. John and P. Langley. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338--345. Morgan Kaufmann, 1995.]]
[12]
W. Klösgen. Explora: A Multipattern and Multistrategy Discovery Assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 3, pages 249--272. AAAI Press/The MIT Press, Menlo Park, California, 1996.]]
[13]
N. Lavrac, P. Flach, and B. Zupan. Rule Evaluation Measures: A Unifying View. In 9th International Workshop on Inductive Logic Programming, Lecture Notes in Computer Science. Springer, 1999.]]
[14]
N. Lavrac, B. Kavsek, P. Flach, and L. Todorovski. Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5:153--188, Feb 2004.]]
[15]
N. Lavrac, F. Zelezny, and P. Flach. RSD: Relational subgroup discovery through first-order feature construction. In 12th International Conference on Inductive Logic Programming. Springer, 2002.]]
[16]
D. Mackay. Introduction To Monte Carlo Methods. In Learning in Graphical Models, pages 175--204. 1998.]]
[17]
I. Mierswa, R. Klinkberg, S. Fischer, and O. Ritthoff. A Flexible Platform for Knowledge Discovery Experiments: YALE -- Yet Another Learning Environment. In LLWA 03 - Tagungsband der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivität, 2003.]]
[18]
T. M. Mitchell. Machine Learning. McGraw Hill, New York, 1997.]]
[19]
R. E. Schapire. The Strength of Weak Learnability. Machine Learning, 5:197--227, 1990.]]
[20]
R. E. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating Prior Knowledge into Boosting. In Proc. of the 19th International Conference on Machine Learning (ICML-02), 2002.]]
[21]
R. E. Schapire and Y. Singer. Improved Boosting Using Confidence-rated Predictions. Machine Learning, 37(3):297--336, 1999.]]
[22]
T. Scheffer and S. Wrobel. Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. Journal of Machine Learning Research, 3:833--862, 2002.]]
[23]
M. Scholz. Knowledge-Based Sampling for Subgroup Discovery. In K. Morik, J.-F. Boulicaut, and A. Siebes, editors, Proc. of the Workshop on Detecting Local Patterns, Lecture Notes in Computer Science. Springer, 2005. To appear.]]
[24]
A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6):970--974, dec 1996.]]
[25]
E. Suzuki. Discovering Interesting Exception Rules with Rule Pair. In ECML/PKDD 2004 Workshop, Advances in Inductive Rule Learning, 2004.]]
[26]
I. Witten and E. Frank. Data Mining -- Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.]]
[27]
S. Wrobel. An Algorithm for Multi--relational Discovery of Subgroups. In J. Komorowski and J. Zytkow, editors, Principles of Data Mining and Knowledge Discovery: First European Symposium (PKDD 97), pages 78--87, Berlin, New York, 1997. Springer.]]
[28]
X. Wu and R. Srihari. Incorporating Prior Knowledge with Weighted Margin Support Vector Machines. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD-2004). AAAI Press, August 2004.]]
[29]
B. Zadrozny, J. Langford, and A. Naoki. Cost--Sensitive Learning by Cost--Proportionate Example Weighting. In Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM'03), 2003.]]

Cited By

View all
  • (2017)A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional dataApplied Soft Computing10.1016/j.asoc.2017.05.04859:C(487-499)Online publication date: 1-Oct-2017
  • (2016)Ensembles of Interesting Subgroups for Discovering High Potential EmployeesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_17(208-220)Online publication date: 19-Apr-2016
  • (2014)Interesting Subset Discovery and Its Application on Service ProcessesData Mining for Service10.1007/978-3-642-45252-9_14(245-269)Online publication date: 4-Jan-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. prior knowledge
  2. sampling
  3. subgroup discovery

Qualifiers

  • Article

Conference

KDD05

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional dataApplied Soft Computing10.1016/j.asoc.2017.05.04859:C(487-499)Online publication date: 1-Oct-2017
  • (2016)Ensembles of Interesting Subgroups for Discovering High Potential EmployeesProceedings, Part II, of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume 965210.1007/978-3-319-31750-2_17(208-220)Online publication date: 19-Apr-2016
  • (2014)Interesting Subset Discovery and Its Application on Service ProcessesData Mining for Service10.1007/978-3-642-45252-9_14(245-269)Online publication date: 4-Jan-2014
  • (2012)A Sequential Sampling Framework for Spectral k-Means Based on Efficient Bootstrap Accuracy EstimationsACM Transactions on Knowledge Discovery from Data10.1145/2297456.22974576:2(1-37)Online publication date: 1-Jul-2012
  • (2012)Semi-supervised clusteringProceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-31537-4_20(252-263)Online publication date: 13-Jul-2012
  • (2009)Ranking interesting subgroupsProceedings of the 26th Annual International Conference on Machine Learning10.1145/1553374.1553491(913-920)Online publication date: 14-Jun-2009
  • (2008)Rule cubes for causal investigationsKnowledge and Information Systems10.1007/s10115-008-0141-718:1(109-132)Online publication date: 17-May-2008
  • (2007)Rule Cubes for Causal InvestigationsProceedings of the 2007 Seventh IEEE International Conference on Data Mining10.1109/ICDM.2007.29(53-62)Online publication date: 28-Oct-2007
  • (2006)YALEProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150531(935-940)Online publication date: 20-Aug-2006
  • (2006)Polynomial association rules with applications to logistic regressionProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150472(586-591)Online publication date: 20-Aug-2006
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media