Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aha, D. (Ed.). 1997. Lazy Learning. Dordrecht: Kluwer Academic Publishers.
Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance-based learning algorithms. Machine Learning 6:37–66.
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Morden Information Retrieval. New York: Addison Wesley and ACM Press.
Bloedorn, E. and Michalski, R. 1998. Data-Driven Constructive Induction: A Methodology and Its Applications. In Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers, pp. 51–68.
Blum, A. and Langley, P. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.
Bradley, P., Fayyad, U., and Reina, C. 1998. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pp. 9–15.
Breiman, L. and Friedman, J. 1984. Tool for large data set analysis. In Statistical Signal Processing, E. Wegman and J. Smith (Eds.). New York: M. Dekker, pp. 191–197.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey: CA.
Brighton, H. and Mellish, C. 2002. Advances in instance selection for instance-based learning. Data Mining and Knowledge Disovery, An International Journal, 6(2):153–172.
Brodley, C.E. 1995. Recursive automatic bias selection for classifier construction. Machine Learning, 20(1/2): 63–94.
Burges, C. 1998. A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2:121–167.
Chang, C. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.
Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough? In Proceedings of ACM SIGMOD, International Conference on Management of Data, L. Haas and A. Tiwary (Eds.). New York: ACM, pp. 436–447.
Cochran, W. 1977. Sampling Techniques. New York: John Wiley & Sons.
Cohn, D., Atlas, L., and Ladner, R. 1994. Improving generalization with active learning. Machine Learning, 15:201–221.
Cohn, D., Ghahramani, Z., and Jordan, M. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145.
Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13:21–27.
Cover, T.M. and Thomas, J.A. 1991. Elements of Information Theory. New York: Wiley.
Devlin, B. 1997. Data Warehouse from Architecture to Implementations. Reading, MA: Addison Wesley Longman, Inc.
Domingo, C., Gavaldà, R., and Watanabe, O. 2002. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Disovery, An International Journal, 6(2):131–152.
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. 1999. Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.
Everitt, B. 1974. Cluster Analysis. London: Heinemann.
Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.). Menlo Park, CA: AAAI Press/The MIT Press, pp. 495–515.
Fisher, D. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Freund, Y. 1994. Sifting informative examples from a random source. In Advances in Neural Information Processing Systems, pp. 85–89.
Freund, Y. 1995. Boosting a weak learning algorithm by majority algorithm. Information and Computation, 121(2):256–285.
Freund, Y. and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.
Harris-Jones, C. and Haines, T.L. 1997. Sample size and misclassification: Is more always better? Working Paper AMSCAT-WP-97-118, AMS Center for Advanced Technologies.
Hussain, F., Liu, H., Tan, C., and Dash, M. 1999. Discretization: An enabling technique. Technical Report: TRC6/99, School of Computing, National University of Singapore.
Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of 10th European Conference on Machine Learning, C. Nedellec and C. Rouveirol (Eds.). Chemnitz, Germany, pp. 137–142.
Kivinen, J. and Mannila, H. 1994. The power of sampling in knowledge discovery. In SIGMOD/PODS' 94, pp. 77–85.
Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.
Lewis, D. and Catlett, J. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh Conference on Machine Learning, pp. 148–156.
Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pp. 3–12.
Liu, H. and Motoda, H. (Eds.). 1998a. Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.
Liu, H. and Motoda, H. 1998b. Feature Selection for Knowledge Discovery Data Mining. Boston: Kluwer Academic Publishers.
Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., and Ridgeway, G. 2002. Liklihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, An International Journal, 6(2):173–190.
McCallum, A. and Nigam, K. 1998. Employing EM in pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 350–358.
Mitchell, T. 1997 Machine Learning. New York: McGraw-Hill.
Piatetsky-Shapiro, G. and Connell, C. 1984. Accurate estimate of the number of tuples satisfying a condition. In ACM SIGMOD Conference, pp. 256–276.
Provost, F., Jensen, D., and Oates, T. 1999. Efficient progressive sampling. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.
Provost, F. and Kolluri, V. 1999. A survey of methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.
Quinlan, J. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Reinartz, T. 1999. Focusing Solutions for Data Mining. New York: Springer. LNAI 1623.
Reinartz, T. 2002. A unifying view on instance selection. Data Mining and Knowledge Disovery, An International Journal, 6(2):191–210.
Schapire, R. 1990. The strength of weak learnability. Machine Learning, 5(2):197–227.
Scholkopf, B., Burges, C., and Vapnik, V. 1995. Extracting support data for a given task. In Proceedings of the First International Conference on Knowledge Discvoery and Data Mining, U. Fayyad and R. Uthurusamy (Eds.). pp. 252–257.
Seung, H., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 287–294.
Smith, P. 1998. Into Statistics. Singapore: Springer-Verlag.
Syed, N., Liu, H., and Sung, K. 1999a. Handling concept drifts in incremental learning with support vector machines. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 317–321.
Syed, N., Liu, H., and Sung, K. 1999b. A study of support vectors on model independent example selection. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 272–276.
Szalay, A. and Gray, J. 1999. Drowning in data. Scientific American www.sciam.com/explorations/1999/.
Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.
Valiant, L. 1984. A theory of the learnable. Communications of the Association for Computing Machinery, 27:1134–1142.
Vapnik, V. 1995. The Nature of Statistical Learning Theory. New York: Springer-Verlag.
Weiss, S. and Indurkhya, N. 1998. Predictive Data Mining. San Francisco, California: Morgan Kaufmann.
Weiss, S. and Kulikowski, C. 1991. Computer Systems That Learn. San Mateo, California: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Liu, H., Motoda, H. On Issues of Instance Selection. Data Mining and Knowledge Discovery 6, 115–130 (2002). https://doi.org/10.1023/A:1014056429969
Issue Date:
DOI: https://doi.org/10.1023/A:1014056429969