More Web Proxy on the site http://driver.im/

research-article

Using trees to depict a forest

Authors:

H. V. JagadishAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 2, Issue 1

Pages 133 - 144

https://doi.org/10.14778/1687627.1687643

Published: 01 August 2009 Publication History

Abstract

When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the "best" results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined.

An alternative approach to the first page is not to try to show the best results, but instead to help users learn what is available in the whole result set and direct them to finding what they need. In this paper, we demonstrate through a user study that a page comprising one representative from each of k clusters (generated through a k-medoid clustering) is superior to multiple alternative candidate methods for generating representatives of a data set.

Users often refine query specifications based on returned results. Traditional clustering may lead to completely new representatives after a refinement step. Furthermore, clustering can be computationally expensive. We propose a tree-based method for efficiently generating the representatives, and smoothly adapting them with query refinement. Experiments show that our algorithms outperform the state-of-the-art in both result quality and efficiency.

References

[1]

Database usability research at university of michigan. http://www.eecs.umich.edu/db/usable/.

[2]

E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR, pages 3--10, 2006.

Digital Library

[3]

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conference, pages 94--105, 1998.

Digital Library

[4]

A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, pages 97--104, 2006.

Digital Library

[5]

S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst., 31(3):1134--1168, 2006.

Digital Library

[6]

M. Ester, H. Kriegel, and X. Xu. A Database Interface for Clustering in Large Spatial Databases. Inst. für Informatik, 1995.

[7]

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.

Digital Library

[8]

M. Ester, H.-P. Kriegel, and X. Xu. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In SSD, pages 67--82, 1995.

Digital Library

[9]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.

Digital Library

[10]

M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. WH Freeman San Francisco, 1979.

Digital Library

[11]

S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In SIGMOD Conference, pages 73--84, 1998.

Digital Library

[12]

M. Hua, J. Pei, A. W.-C. Fu, X. Lin, and H. fung Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.

Digital Library

[13]

B. J. Jansen and A. Spink. How are we searching the world wide web? a comparison of nine search engine transaction logs. Inf. Process. Manage., 42(1):248--263, 2006.

Digital Library

[14]

S. Kandel, A. Paepcke, M. Theobald, H. Garcia-Molina, and E. Abelson. Photospread: a spreadsheet for managing photos. In CHI, pages 1749--1758, 2008.

Digital Library

[15]

L. Kaufman and P. Rousseeuw. Finding groups in data. an introduction to cluster analysis. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics, New York: Wiley, 1990.

[16]

C. Li, M. Wang, L. Lim, H. Wang, and K. C.-C. Chang. Supporting ranking and clustering as generalized order-by and group-by. In SIGMOD Conference, pages 127--138, 2007.

Digital Library

[17]

B. Liu and H. Jagadish. A spreadsheet algebra for a direct data manipulation query interface. In ICDE, 2009.

Digital Library

[18]

B. Liu and H. V. Jagadish. Datalens: Making a good first impression. In SIGMOD Conference, Demonstration Track, 2009.

Digital Library

[19]

B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Eng., 13(1):124--141, 2001.

Digital Library

[20]

F. Morii. A generalized k-means algorithm with semi-supervised weight coefficients. In ICPR (3), pages 198--201, 2006.

Digital Library

[21]

K. Mouratidis, D. Papadias, and S. Papadimitriou. Medoid queries in large spatial databases. In SSTD, pages 55--72, 2005.

Digital Library

[22]

K. Mouratidis, D. Papadias, and S. Papadimitriou. Tree-based partition querying: a methodology for computing medoids in large spatial datasets. VLDB J., 17(4):923--945, 2008.

Digital Library

[23]

R. T. Ng and J. Han. Clarans: A method for clustering objects for spatial data mining. IEEE Trans. on Knowl. and Data Eng., 14(5):1003--1016, 2002.

Digital Library

[24]

R. Nosofsky and S. Zaki. Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Learning, Memory, 28(5):924--940, 2002.

[25]

C. R. Palmer and C. Faloutsos. Density biased sampling: An improved method for data mining and clustering. In SIGMOD Conference, pages 82--92, 2000.

Digital Library

[26]

F. Pan, W. Wang, A. K. H. Tung, and J. Yang. Finding representative set from massive data. In ICDM, pages 338--345, 2005.

Digital Library

[27]

L. Parsons, E. Haque, and H. Liu. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 6(1):90--105, 2004.

Digital Library

[28]

H. Shin and R. Nosofsky. Similarity-scaling studies of dot-pattern classification and recognition. Journal of Experimental Psychology: General, 121(3):278--304, 1992.

[29]

B. Shneiderman. Direct manipulation: a step beyond programming languages. IEEE Computer, 16(8):57--69, 1983.

Digital Library

[30]

J. Smith and J. Minda. Prototypes in the mist: The early epochs of category learning. Learning, Memory, 24(6):1411--1436, 1998.

[31]

E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008.

Digital Library

[32]

T. Wu, X. Li, D. Xin, J. Han, J. Lee, and R. Redder. Datascope: Viewing database contents in google maps' way. In VLDB, pages 1314--1317, 2007.

Digital Library

[33]

R. Xu and I. Donald Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645, 2005.

Digital Library

[34]

T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In SIGMOD Conference, pages 103--114, 1996.

Digital Library

Cited By

Razmadze KAmsterdamer YSomech ADavidson SMilo TIves ZBonifati AEl Abbadi A(2022)SubTab: Data Exploration with Informative Sub-TablesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520154(2369-2372)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3520154
Drosou MPitoura E(2015)Multiple Radii DisC DiversityACM Transactions on Database Systems10.1145/269949940:1(1-43)Online publication date: 25-Mar-2015
https://dl.acm.org/doi/10.1145/2699499
Singh M(2015)Information Exploration in E-Commerce DatabasesProceedings of the 4th International Conference on Big Data Analytics - Volume 949810.1007/978-3-319-27057-9_3(41-56)Online publication date: 15-Dec-2015
https://dl.acm.org/doi/10.1007/978-3-319-27057-9_3
Show More Cited By

Index Terms

Using trees to depict a forest

Recommendations

Pruning trees in C-fuzzy random forest
Abstract
Pruning decision trees is the way to decrease their size in order to reduce classification time and improve (or at least maintain) classification accuracy. In this paper, the idea of applying different pruning methods to C-fuzzy decision trees and ...
Pruning optimum-path forest ensembles using metaheuristic optimization for land-cover classification

Machine learning techniques have been actively pursued in the last years, mainly due to the increasing number of applications that make use of some sort of intelligent mechanism for decision-making processes. In this context, we shall highlight ensemble ...
Mining patterns for clustering on numerical datasets using unsupervised decision trees

Pattern-based clustering algorithms return a set of patterns that describe the objects of each cluster. The most recent algorithms proposed in this approach extract patterns on numerical datasets by applying an a priori discretization process, which may ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 2, Issue 1

August 2009

1293 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009

Published in PVLDB Volume 2, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
227
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Razmadze KAmsterdamer YSomech ADavidson SMilo TIves ZBonifati AEl Abbadi A(2022)SubTab: Data Exploration with Informative Sub-TablesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520154(2369-2372)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3520154
Drosou MPitoura E(2015)Multiple Radii DisC DiversityACM Transactions on Database Systems10.1145/269949940:1(1-43)Online publication date: 25-Mar-2015
https://dl.acm.org/doi/10.1145/2699499
Singh M(2015)Information Exploration in E-Commerce DatabasesProceedings of the 4th International Conference on Big Data Analytics - Volume 949810.1007/978-3-319-27057-9_3(41-56)Online publication date: 15-Dec-2015
https://dl.acm.org/doi/10.1007/978-3-319-27057-9_3
Küçüktunç OSaule EKaya KÇatalyürek Ü(2014)Diversifying Citation RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/26681065:4(1-21)Online publication date: 15-Dec-2014
https://dl.acm.org/doi/10.1145/2668106
Eravci BFerhatosmanoglu H(2013)Diversity based relevance feedback for time series searchProceedings of the VLDB Endowment10.14778/2732228.27322307:2(109-120)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.14778/2732228.2732230
Drosou MPitoura E(2012)DisC diversityProceedings of the VLDB Endowment10.14778/2428536.24285386:1(13-24)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.14778/2428536.2428538
Drosou MPitoura E(2012)Dynamic diversification of continuous dataProceedings of the 15th International Conference on Extending Database Technology10.1145/2247596.2247623(216-227)Online publication date: 27-Mar-2012
https://dl.acm.org/doi/10.1145/2247596.2247623
Singh MNandi AJagadish HCandan KChen YSnodgrass RGravano LFuxman A(2012)SkimmerProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213858(181-192)Online publication date: 20-May-2012
https://dl.acm.org/doi/10.1145/2213836.2213858
Liu ZNatarajan SChen Y(2011)Query expansion based on clustered resultsProceedings of the VLDB Endowment10.14778/1978665.19786674:6(350-361)Online publication date: 1-Mar-2011
https://dl.acm.org/doi/10.14778/1978665.1978667
Boim RMilo TNovgorodov S(2011)Diversification and refinement in collaborative filtering recommenderProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063684(739-744)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2063684
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents