[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-642-41398-8_20guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Diversity-Driven Widening

Published: 17 October 2013 Publication History

Abstract

This paper follows our earlier publicationï ź[1], where we introduced the idea of tuned data mining which draws on parallel resources to improve model accuracy rather than the usual focus on speed-up. In this paper we present a more in-depth analysis of the concept of Widened Data Mining, which aims at reducing the impact of greedy heuristics by exploring more than just one suitable solution at each step. In particular we focus on how diversity considerations can substantially improve results. We again use the greedy algorithm for the set cover problem to demonstrate these effects in practice.

References

[1]
Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. eds. IDA 2012. LNCS, vol. 7619, pp. 23---34. Springer, Heidelberg 2012
[2]
Akl, S.G.: Parallel real-time computation: Sometimes quantity means quality. Computing and Informatics 21, 455---487 2002
[3]
Kumar, V.: Special Issue on High-performance Data Mining. Academic Press 2001
[4]
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press 2000
[5]
Zaki, M.J., Ho, C.-T. eds.: KDD 1999. LNCS LNAI, vol. 1759. Springer, Heidelberg 2000
[6]
Zaki, M.J., Pan, Y.: Introduction: Recent developments in parallel and distributed data mining. DPD 112, 123---127 2002
[7]
Shafer, J., Agrawal, R., Mehta, M.: Sprint: A scalable parallel classifier for data mining. In: VLDB, pp. 544---555 1996
[8]
Zaki, M.J., Ho, C.-T., Agrawal, R.: Parallel classification for data mining on shared-memory multiprocessors. In: ICDE, pp. 198---205 1999
[9]
Darlington, J., Guo, Y.-K., Sutiwaraphun, J., To, H.W.: Parallel induction algorithms for data mining. In: Liu, X., Cohen, P., Berthold, M. eds. IDA 1997. LNCS, vol. 1280, pp. 437---445. Springer, Heidelberg 1997
[10]
Srivastava, A., Han, E.-H., Kumar, V., Singh, V.: Parallel formulations of decision-tree classification algorithms. DMKD 33, 237---261 1999
[11]
Kufrin, R.: Decision trees on parallel processors. In: PPAI, pp. 279---306 1995
[12]
Zaki, M.J.: Parallel and distributed association mining: a survey. IEEE Concurrency 74, 14---25 1999
[13]
Judd, D., McKinley, P.K., Jain, A.K.: Large-scale parallel data clustering. TPAMI 208, 871---876 1998
[14]
Dhillon, I., Modha, D.: A data-clustering algorithm on distributed memory multiprocessors. In: Large-scale Parallel KDD Systems Workshop, ACM SIGKDD, pp. 245---260 2000
[15]
Olson, C.F.: Parallel algorithms for hierarchical clustering. JPC 21, 1313---1325 1995
[16]
Garg, A., Mangla, A., Gupta, N., Bhatnagar, V.: PBIRCH: A scalable parallel clustering algorithm for incremental data. In: IDEAS, pp. 315---316 2006
[17]
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 511, 107---113 2008
[18]
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y.Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281---288 2006
[19]
Ma, Z., Gu, L.: The limitation of MapReduce: A probing case and a lightweight solution. In: Intl. Conf. on Cloud Computing, GRIDs, and Virtualization, pp. 68---73 2010
[20]
Breiman, L.: Bagging predictors. JML 242, 123---140 1996
[21]
Schapire, R.E.: The strength of weak learnability. JML 5, 28---33 1990
[22]
Breiman, L.: Random forests. JML 451, 5---32 2001
[23]
Talia, D.: Parallelism in knowledge discovery techniques. In: Fagerholm, J., Haataja, J., Järvinen, J., Lyly, M., Råback, P., Savolainen, V. eds. PARA 2002. LNCS, vol. 2367, pp. 127---136. Springer, Heidelberg 2002
[24]
Shell, P., Rubio, J.A.H., Barro, G.Q.: Improving search through diversity. In: AAAI 1994
[25]
Harvey, W.D., Ginsberg, M.L.: Limited discrepancy search. IJCAI, 607---615 1995
[26]
Felner, A., Kraus, S., Korf, R.E.: KBFS: K-best-first search. AMAI 391-2, 19---39 2003
[27]
Berger, B., Rompel, J., Shor, P.W.: Efficient nc algorithms for set cover with applications to learning and geometry. JCSS 493, 454---477 1994
[28]
Blelloch, G.E., Peng, R., Tangwongsan, K.: Linear-work greedy parallel approximate set cover and variants. In: SPAA, pp. 23---32 2011
[29]
Johnson, D.S.: Approximation algorithms for combinatorial problems. In: STOC, pp. 38---49 1973
[30]
Beasley, J.E.: Or-library: Distributing test problems by electronic mail. The Journal of the Operational Research Society 4111, 1069---1072 1990

Cited By

View all
  • (2024)Language-Model Based Informed Partition of Databases to Speed Up Pattern MiningProceedings of the ACM on Management of Data10.1145/36549872:3(1-27)Online publication date: 30-May-2024
  • (2018)Neighborhood-based Strategies for Widening of the Greedy Algorithm of the Set Cover ProblemProceedings of the 19th International Conference on Computer Systems and Technologies10.1145/3274005.3274036(27-32)Online publication date: 13-Sep-2018
  • (2018)Communication-less Strategies for the Widening of Rule InductionProceedings of the 19th International Conference on Computer Systems and Technologies10.1145/3274005.3274033(33-37)Online publication date: 13-Sep-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IDA 2013: Proceedings of the 12th International Symposium on Advances in Intelligent Data Analysis XII - Volume 8207
October 2013
461 pages
ISBN:9783642413971
  • Editors:
  • Allan Tucker,
  • Frank Höppner,
  • Arno Siebes,
  • Stephen Swift

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 October 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Language-Model Based Informed Partition of Databases to Speed Up Pattern MiningProceedings of the ACM on Management of Data10.1145/36549872:3(1-27)Online publication date: 30-May-2024
  • (2018)Neighborhood-based Strategies for Widening of the Greedy Algorithm of the Set Cover ProblemProceedings of the 19th International Conference on Computer Systems and Technologies10.1145/3274005.3274036(27-32)Online publication date: 13-Sep-2018
  • (2018)Communication-less Strategies for the Widening of Rule InductionProceedings of the 19th International Conference on Computer Systems and Technologies10.1145/3274005.3274033(33-37)Online publication date: 13-Sep-2018
  • (2017)Bucket Selection: A Model-Independent Diverse Selection Strategy for WideningAdvances in Intelligent Data Analysis XVI10.1007/978-3-319-68765-0_8(87-98)Online publication date: 26-Oct-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media