[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1247480.1247496acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Supporting ranking and clustering as generalized order-by and group-by

Published: 11 June 2007 Publication History

Abstract

The Boolean semantics of SQL queries cannot adequately capture the "fuzzy" preferences and "soft" criteria required in non-traditional data retrieval applications. One way to solve this problem is to add a flavor of "information retrieval" into database queries by allowing fuzzy query conditions and flexibly supporting grouping and ranking of the query results within the DBMS engine. While ranking is already supported by all major commercial DBMSs natively, support of flexibly grouping is still very limited (i.e., group-by).
In this paper, we propose to generalize group-by to enable flexible grouping (clustering specifically) of the query results. Different from clustering in data mining applications, our focus is on supporting efficient clustering of Boolean results generated at query time. Moreover, we propose to integrate ranking and clustering with Boolean conditions, forming a new type of ClusterRank query to allow structured data retrieval. Such an integration is nontrivial in terms of both semantics and query processing. We investigate various semantics of this type of queries. To process such queries, a straightforward approach is to simply glue the techniques developed for ranking-only and clustering-only together. This approach is costly since both ranking and clustering are treated as blocking post-processing tasks upon Boolean query results by existing techniques. We propose a summary-based evaluation method that utilizes bitmap index to seamlessly integrate Boolean conditions, clustering, and ranking. Experimental study shows that our approach significantly outperforms the straightforward one and maintains high clustering quality.

References

[1]
S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis. Automated ranking of database query results. In CIDR, 2003.
[2]
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
[3]
M. J. Carey and D. Kossmann. On saying "enough already!" in SQL. In SIGMOD, pages 219--230, 1997.
[4]
K. Chakrabarti, S. Chaudhuri, and S. Hwang. Automatic categorization of query results. In SIGMOD, pages 755--766, 2004.
[5]
C. Y. Chan and Y. E. Ioannidis. An efficient bitmap encoding scheme for selection queries. In SIGMOD, 1999.
[6]
S. Chaudhuri and L. Gravano. Evaluating top-k selection queries. In VLDB, pages 397--410, 1999.
[7]
S. Chaudhuri, R. Ramakrishnan, and G. Weikum. Integrating DB and IR technologies: What is the sound of one hand clapping? In CIDR, pages 1--12, 2005.
[8]
D. Donjerkovic and R. Ramakrishnan. Probabilistic optimization of top n queries. In VLDB, 1999.
[9]
R. Fagin, A. Lote, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
[10]
F. Farnstrom, J. Lewis, and C. Elkan. Scalability for clustering algorithms revisited. 2(1):51--57, August 2000.
[11]
V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS-clustering categorical data using summaries. In KDD, pages 73--83, 1999.
[12]
M. Gavrilov, D. Anguelov, P. Indyk, and R. Motwani. Mining the stock market (extended abstract): which measure is best? In SIGKDD, pages 487--496, 2000.
[13]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, New York, 2000.
[14]
V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. SIGMOD, 2001.
[15]
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. In VLDB, pages 754--765, 2003.
[16]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264--323, 1999.
[17]
K. Kerdprasop, N. Kerdprasop, and P. Sattayatham. Weighted k-means for density-biased clustering. In DaWaK, pages 488--497, 2005.
[18]
B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In SIGKDD, pages 16--22, 1999.
[19]
A. Leuski and J. Allan. Improving interactive retrieval by combining ranked lists and clustering. In RIAO, pages 665--681, 2000.
[20]
C. Li, K. C. C. Chang, I. F. Ilyas, and S. Song. RankSQL: Query algebra and optimization for relational top-k queries. In SIGMOD, pages 131--142, 2005.
[21]
F. Morii. A generalized k-means algorithm with semi-supervised weight coefficients. In ICPR, pages 198--201, 2006.
[22]
P. E. O'Neil. Model 204 architecture and performance. In Proceedings of the 2nd International Workshop on High Performance Transaction Systems, pages 40--59, 1987.
[23]
P. E. O'Neil and G. Graefe. Multi-table joins through bitmapped join indices. SIGMOD Record, 24(3):8--11, 1995.
[24]
P. E. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, pages 38--49, 1997.
[25]
G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In VLDB, pages 428--439, 1998.
[26]
R. R. Sinha, S. Mitra, and M. Winslett. Bitmap indexes for large scientific data sets: A case study. In IPDPS, 2006.
[27]
W. Wang, J. Yang, and R. R. Muntz. STING: A statistical information grid approach to spatial data mining. In VLDB, pages 186--195, 1997.
[28]
K. Wu, E. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM TODS, 31(1):1--38, 2006.
[29]
M. C. Wu and A. P. Buchmann. Encoded bitmap indexing for data warehouses. In ICDE, pages 220--230, 1998.
[30]
F. Zemke, K. Kulkarni, A. Witkowski, and B. Lyle. Introduction to OLAP function. Change proposal. ANS-NCTS H2-99-14 (April), 1999.
[31]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In SIGMOD, pages 103--114, 1996.

Cited By

View all
  • (2024)Similarity joins and clustering for SPARQLSemantic Web10.3233/SW-24354015:5(1701-1732)Online publication date: 9-Oct-2024
  • (2021)Una Implementación para el Agrupamiento Difuso en SQLRevista Tecnica De La Facultad De Ingenieria Universidad Del Zulia10.22209/rt.v44n1a0544:1(36-43)Online publication date: 1-Jan-2021
  • (2020)Fuzzy Analytical Queries: A New Approach to Flexible Fuzzy Queries2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ48607.2020.9177556(1-8)Online publication date: Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
  • General Chairs:
  • Lizhu Zhou,
  • Tok Wang Ling,
  • Program Chair:
  • Beng Chin Ooi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. data exploration
  3. grouping
  4. query processing
  5. ranking
  6. retrieval
  7. top-k

Qualifiers

  • Article

Conference

SIGMOD/PODS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Similarity joins and clustering for SPARQLSemantic Web10.3233/SW-24354015:5(1701-1732)Online publication date: 9-Oct-2024
  • (2021)Una Implementación para el Agrupamiento Difuso en SQLRevista Tecnica De La Facultad De Ingenieria Universidad Del Zulia10.22209/rt.v44n1a0544:1(36-43)Online publication date: 1-Jan-2021
  • (2020)Fuzzy Analytical Queries: A New Approach to Flexible Fuzzy Queries2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ48607.2020.9177556(1-8)Online publication date: Jul-2020
  • (2017)Model-Based Diversification for Sequential Exploratory QueriesData Science and Engineering10.1007/s41019-017-0038-02:2(151-168)Online publication date: 27-Mar-2017
  • (2017)Semantic Similarity Group By Operators for Metric DataSimilarity Search and Applications10.1007/978-3-319-68474-1_17(247-261)Online publication date: 28-Sep-2017
  • (2015)Information Exploration in E-Commerce DatabasesProceedings of the 4th International Conference on Big Data Analytics - Volume 949810.1007/978-3-319-27057-9_3(41-56)Online publication date: 15-Dec-2015
  • (2013)A Learning Approach to SQL Query Results Ranking Using Skyline and Users' Current Navigational BehaviorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2012.12825:12(2683-2693)Online publication date: 1-Dec-2013
  • (2013)Cluster-By: An Efficient Clustering Operator in Emergency Management Database SystemsWeb-Age Information Management10.1007/978-3-642-39527-7_17(152-164)Online publication date: 2013
  • (2012)Query Processing over Uncertain DatabasesSynthesis Lectures on Data Management10.2200/S00465ED1V01Y201212DTM0334:6(1-101)Online publication date: 15-Dec-2012
  • (2012)SkimmerProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213858(181-192)Online publication date: 20-May-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media