[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Using trees to depict a forest

Published: 01 August 2009 Publication History

Abstract

When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the "best" results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined.
An alternative approach to the first page is not to try to show the best results, but instead to help users learn what is available in the whole result set and direct them to finding what they need. In this paper, we demonstrate through a user study that a page comprising one representative from each of k clusters (generated through a k-medoid clustering) is superior to multiple alternative candidate methods for generating representatives of a data set.
Users often refine query specifications based on returned results. Traditional clustering may lead to completely new representatives after a refinement step. Furthermore, clustering can be computationally expensive. We propose a tree-based method for efficiently generating the representatives, and smoothly adapting them with query refinement. Experiments show that our algorithms outperform the state-of-the-art in both result quality and efficiency.

References

[1]
Database usability research at university of michigan. http://www.eecs.umich.edu/db/usable/.
[2]
E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR, pages 3--10, 2006.
[3]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conference, pages 94--105, 1998.
[4]
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, pages 97--104, 2006.
[5]
S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst., 31(3):1134--1168, 2006.
[6]
M. Ester, H. Kriegel, and X. Xu. A Database Interface for Clustering in Large Spatial Databases. Inst. für Informatik, 1995.
[7]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.
[8]
M. Ester, H.-P. Kriegel, and X. Xu. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In SSD, pages 67--82, 1995.
[9]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
[10]
M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. WH Freeman San Francisco, 1979.
[11]
S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In SIGMOD Conference, pages 73--84, 1998.
[12]
M. Hua, J. Pei, A. W.-C. Fu, X. Lin, and H. fung Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.
[13]
B. J. Jansen and A. Spink. How are we searching the world wide web? a comparison of nine search engine transaction logs. Inf. Process. Manage., 42(1):248--263, 2006.
[14]
S. Kandel, A. Paepcke, M. Theobald, H. Garcia-Molina, and E. Abelson. Photospread: a spreadsheet for managing photos. In CHI, pages 1749--1758, 2008.
[15]
L. Kaufman and P. Rousseeuw. Finding groups in data. an introduction to cluster analysis. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics, New York: Wiley, 1990.
[16]
C. Li, M. Wang, L. Lim, H. Wang, and K. C.-C. Chang. Supporting ranking and clustering as generalized order-by and group-by. In SIGMOD Conference, pages 127--138, 2007.
[17]
B. Liu and H. Jagadish. A spreadsheet algebra for a direct data manipulation query interface. In ICDE, 2009.
[18]
B. Liu and H. V. Jagadish. Datalens: Making a good first impression. In SIGMOD Conference, Demonstration Track, 2009.
[19]
B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Eng., 13(1):124--141, 2001.
[20]
F. Morii. A generalized k-means algorithm with semi-supervised weight coefficients. In ICPR (3), pages 198--201, 2006.
[21]
K. Mouratidis, D. Papadias, and S. Papadimitriou. Medoid queries in large spatial databases. In SSTD, pages 55--72, 2005.
[22]
K. Mouratidis, D. Papadias, and S. Papadimitriou. Tree-based partition querying: a methodology for computing medoids in large spatial datasets. VLDB J., 17(4):923--945, 2008.
[23]
R. T. Ng and J. Han. Clarans: A method for clustering objects for spatial data mining. IEEE Trans. on Knowl. and Data Eng., 14(5):1003--1016, 2002.
[24]
R. Nosofsky and S. Zaki. Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Learning, Memory, 28(5):924--940, 2002.
[25]
C. R. Palmer and C. Faloutsos. Density biased sampling: An improved method for data mining and clustering. In SIGMOD Conference, pages 82--92, 2000.
[26]
F. Pan, W. Wang, A. K. H. Tung, and J. Yang. Finding representative set from massive data. In ICDM, pages 338--345, 2005.
[27]
L. Parsons, E. Haque, and H. Liu. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 6(1):90--105, 2004.
[28]
H. Shin and R. Nosofsky. Similarity-scaling studies of dot-pattern classification and recognition. Journal of Experimental Psychology: General, 121(3):278--304, 1992.
[29]
B. Shneiderman. Direct manipulation: a step beyond programming languages. IEEE Computer, 16(8):57--69, 1983.
[30]
J. Smith and J. Minda. Prototypes in the mist: The early epochs of category learning. Learning, Memory, 24(6):1411--1436, 1998.
[31]
E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008.
[32]
T. Wu, X. Li, D. Xin, J. Han, J. Lee, and R. Redder. Datascope: Viewing database contents in google maps' way. In VLDB, pages 1314--1317, 2007.
[33]
R. Xu and I. Donald Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645, 2005.
[34]
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In SIGMOD Conference, pages 103--114, 1996.

Cited By

View all
  • (2022)SubTab: Data Exploration with Informative Sub-TablesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520154(2369-2372)Online publication date: 10-Jun-2022
  • (2015)Multiple Radii DisC DiversityACM Transactions on Database Systems10.1145/269949940:1(1-43)Online publication date: 25-Mar-2015
  • (2015)Information Exploration in E-Commerce DatabasesProceedings of the 4th International Conference on Big Data Analytics - Volume 949810.1007/978-3-319-27057-9_3(41-56)Online publication date: 15-Dec-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 2, Issue 1
August 2009
1293 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009
Published in PVLDB Volume 2, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)SubTab: Data Exploration with Informative Sub-TablesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520154(2369-2372)Online publication date: 10-Jun-2022
  • (2015)Multiple Radii DisC DiversityACM Transactions on Database Systems10.1145/269949940:1(1-43)Online publication date: 25-Mar-2015
  • (2015)Information Exploration in E-Commerce DatabasesProceedings of the 4th International Conference on Big Data Analytics - Volume 949810.1007/978-3-319-27057-9_3(41-56)Online publication date: 15-Dec-2015
  • (2014)Diversifying Citation RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/26681065:4(1-21)Online publication date: 15-Dec-2014
  • (2013)Diversity based relevance feedback for time series searchProceedings of the VLDB Endowment10.14778/2732228.27322307:2(109-120)Online publication date: 1-Oct-2013
  • (2012)DisC diversityProceedings of the VLDB Endowment10.14778/2428536.24285386:1(13-24)Online publication date: 1-Nov-2012
  • (2012)Dynamic diversification of continuous dataProceedings of the 15th International Conference on Extending Database Technology10.1145/2247596.2247623(216-227)Online publication date: 27-Mar-2012
  • (2012)SkimmerProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213858(181-192)Online publication date: 20-May-2012
  • (2011)Query expansion based on clustered resultsProceedings of the VLDB Endowment10.14778/1978665.19786674:6(350-361)Online publication date: 1-Mar-2011
  • (2011)Diversification and refinement in collaborative filtering recommenderProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063684(739-744)Online publication date: 24-Oct-2011
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media