[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/647915.738871guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Pattern Detection and Discovery

Published: 16 September 2002 Publication History

Abstract

Data mining comprises two subdisciplines. One of these is based on statistical modelling, though the large data sets associated with data mining lead to new problems for traditional modelling methodology. The other, which we term pattern detection, is a new science. Pattern detection is concerned with defining and detecting local anomalies within large data sets, and tools and methods have been developed in parallel by several applications communities, typically with no awareness of developments elsewhere. Most of the work to date has focussed on the development of practical methodology, with little attention being paid to the development of an underlying theoretical base to parallel the theoretical base developed over the last century to underpin modelling approaches. We suggest that the time is now right for the development of a theoretical base, so that important common aspects of the work can be identified, so that key directions for future research can be characterised, and so that the various different application domains can benefit from the work in other areas. We attempt describe a unified approach to the subject, and also attempt to provide theoretical base on which future developments can stand.

References

[1]
Grenander U.: General Pattern Theory: a Mathematical Study of Regular Structures. Clarendon Press, Oxford (1993).
[2]
Klösgen, W.: Subgroup patterns. In: Klösgen, W., Zytkow, J.M. (eds.): Handbook of data mining and knowledge discovery. Oxford University Press, New York (1999).
[3]
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9(2) (1999) 1-20.
[4]
Hand D.J., Blunt G., Kelly M.G., Adams N.M.: Data mining for fun and profit. Statistical Science 15 (2000) 111-131.
[5]
Hand D.J., Mannila H., Smyth P.: Principles of Data Mining. MIT Press (2001).
[6]
Chau T., Wong A.K.C.: Pattern discovery by residual analysis and recursive partitioning. IEEE Transactions on Knowledge and Data Engineering 11 (1999) 833-852.
[7]
Adams N.M., Hand D.J., Till, R.J.: Mining for classes and patterns in behavioural data.Journal of the Operational Research Society 52 (2001) 1017-1024.
[8]
Bolton R.J., Hand D.J.: Significance tests for patterns in continuous data. In: Proceedings of the IEEE International Conference on Data Mining, San Jose, CA. Springer-Verlag (2001).
[9]
Edwards R.D., Magee F.: Technical Analysis of Stock Trends. 7th edn. AMACOM, New York (1997).
[10]
Jobman D.R.: The Handbook of Technical Analysis. Probus Publishing Co. (1995).
[11]
Zembowicz R., Zytkow J.: From contingency tables to various forms of knowledge in databases. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining, Menlo Park, California, AAAI Press (1996) 329-349.
[12]
Liu B., Hsu W., Ma Y.: Pruning and summarizing the discovered associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, ACM Press (1999) 125-134.
[13]
DuMouchel, W.: Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System. The American Statistician 53 (1999) 177-202.
[14]
Jelinek F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts (1997).
[15]
Sinha S., Tompa M.: A statistical method for finding transcription factor binding sites. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, La Jolla, CA, AAAI Press (2000) 344-354.
[16]
Chudova, D., Smyth, P.: Unsupervised identification of sequential patterns under a Markov assumption. In: Proceedings of the KDD 2001 Workshop on Temporal Data Mining, San Francisco, CA (2001).
[17]
Durbin R., Eddy S., Krogh A., Mitchison G.: Biological Sequence Analysis. Cambridge University Press: Cambridge (1998).
[18]
Hand D.J., Bolton R.J.: Pattern detection in data mining. Technical Report, Department of Mathematics, Imperial College, London (2002).
[19]
Dong G., Li J.: Interestingness of discovered association rules in terms of neighbourhood-based unexpectedness. In: Proceedings of the Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD), Lecture Notes in Computer Science, Vol. 1394., Springer-Verlag, Berlin Heidelberg New York (1998) 72-86.
[20]
Toivonen H., Klemettinen M., Ronkainen P., Hätönen, Mannila H.: Pruning and grouping discovered association rules. In: Mlnet Workshop on Statistics, Machine Learning, and Discovery in Databases, Crete, Greece, MLnet (1995) 47-52.
[21]
Brin S., Motwani R., Ullma J.D., Tsur S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, ACM Press (1997) 255-264.
[22]
Miller R.G.: Simultaneous Statistical Inference. 2nd ed. Springer-Verlag, New York (1981).
[23]
Pigeot I.: Basic concepts of multiple tests - a survey. Statistical Papers 41 (2000) 3-36.
[24]
Benjamini Y., Hochberg Y.: Controlling the false discovery rate. Journal of the Royal Statistical Society, Series B 57 (1995) 289-300.
[25]
Bolton R.J., Hand D.J., Adams, N.: Determining hit rate in pattern search. In: These Proceedings (2002).
[26]
Berry M.J.A., Linoff G.: Mastering data mining. The art and science of customer relationship management. Wiley, New York (2000).
[27]
Brunskill A.J.: Some sources of error in the coding of birth weight. American Journal of Public Health 80 (1990) 72-3.

Cited By

View all
  • (2020)Mining the Local Dependency Itemset in a Products NetworkACM Transactions on Management Information Systems10.1145/338447311:1(1-31)Online publication date: 17-Apr-2020
  • (2017)Finding Spatiotemporal Co-occurrence Patterns of Heterogeneous Events for PredictionProceedings of the 3rd ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management10.1145/3152465.3152475(1-8)Online publication date: 7-Nov-2017
  • (2017)Exceptionally monotone models--the rank correlation model class for Exceptional Model MiningKnowledge and Information Systems10.1007/s10115-016-0979-z51:2(369-394)Online publication date: 1-May-2017
  • Show More Cited By
  1. Pattern Detection and Discovery

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
    September 2002
    226 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 16 September 2002

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Mining the Local Dependency Itemset in a Products NetworkACM Transactions on Management Information Systems10.1145/338447311:1(1-31)Online publication date: 17-Apr-2020
    • (2017)Finding Spatiotemporal Co-occurrence Patterns of Heterogeneous Events for PredictionProceedings of the 3rd ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management10.1145/3152465.3152475(1-8)Online publication date: 7-Nov-2017
    • (2017)Exceptionally monotone models--the rank correlation model class for Exceptional Model MiningKnowledge and Information Systems10.1007/s10115-016-0979-z51:2(369-394)Online publication date: 1-May-2017
    • (2014)An approach for increasing the level of accuracy in supply chain simulation by using patterns on input dataProceedings of the 2014 Winter Simulation Conference10.5555/2693848.2694087(1897-1906)Online publication date: 7-Dec-2014
    • (2014)20 years of pattern miningACM SIGKDD Explorations Newsletter10.1145/2594473.259448015:1(41-50)Online publication date: 17-Mar-2014
    • (2012)Linear space direct pattern sampling using coupling from the pastProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339545(69-77)Online publication date: 12-Aug-2012
    • (2011)A relational view of pattern discoveryProceedings of the 16th international conference on Database systems for advanced applications - Volume Part I10.5555/1997305.1997323(153-167)Online publication date: 22-Apr-2011
    • (2011)Direct local pattern sampling by efficient two-step random proceduresProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2020408.2020500(582-590)Online publication date: 21-Aug-2011
    • (2010)Combining CSP and constraint-based mining for pattern discoveryProceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II10.1007/978-3-642-12165-4_35(432-447)Online publication date: 23-Mar-2010
    • (2009)Agglomerating local patterns hierarchically with ALPHAProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646222(1753-1756)Online publication date: 2-Nov-2009
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media