More Web Proxy on the site http://driver.im/

research-article

Discovering the skyline of web databases

Authors:

Abolfazl Asudeh,

Saravanan Thirumuruganathan,

Gautam DasAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 9, Issue 7

Pages 600 - 611

https://doi.org/10.14778/2904483.2904491

Published: 01 March 2016 Publication History

Abstract

Many web databases are "hidden" behind proprietary search interfaces that enforce the top-k output constraint, i.e., each query returns at most k of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-k hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient algorithms for three most popular types, i.e., one-ended range, free range and point predicates, and then combine them to support web databases that feature a mixture of these types. Rigorous theoretical analysis and extensive real-world online and offline experiments demonstrate the effectiveness of our proposed techniques and their superiority over baseline solutions.

References

[1]

B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, 2007.

Digital Library

[2]

A. Asudeh, S. Thirumuruganathan, N. Zhang, and G. Das. Discovering the skyline of web databases. CoRR, abs/1512.02138, 2015.

[3]

A. Asudeh, G. Zhang, N. Hassan, C. Li, and G. Zaruba. Crowdsourcing pareto-optimal object finding by pairwise comparisons. CIKM, 2015.

Digital Library

[4]

W.-T. Balke, U. Güntzer, and J. X. Zheng. Efficient distributed skylining for web information systems. In EDBT, 2004.

[5]

S. Borzsony, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, 2001.

Digital Library

[6]

C. Buchta. On the average number of maxima in a set of vectors. Information Processing Letters, 33(2), 1989.

Digital Library

[7]

J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting. In ICDE, 2003.

[8]

A. Dasgupta, G. Das, and H. Mannila. A random walk approach to sampling hidden databases. In SIGMOD, 2007.

Digital Library

[9]

A. Dasgupta, N. Zhang, and G. Das. Leveraging count information in sampling hidden databases. In ICDE, 2009.

Digital Library

[10]

A. Dasgupta, N. Zhang, and G. Das. Turbo-charging hidden database samplers with overflowing queries and skew reduction. In EDBT, 2010.

Digital Library

[11]

E. Dellis and B. Seeger. Efficient computation of reverse skyline queries. In VLDB, 2007.

Digital Library

[12]

Z. Gong, G.-Z. Sun, J. Yuan, and Y. Zhong. Efficient top-k query algorithms using k-skyband partition. In Scalable Information Systems. Springer, 2009.

[13]

I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 40(4), 2008.

[14]

D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, 2002.

Digital Library

[15]

X. Lin, Y. Yuan, W. Wang, and H. Lu. Stabbing the sky: Efficient skyline computation over sliding windows. In ICDE, 2005.

[16]

X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In ICDE, 2007.

[17]

T. Liu, F. Wang, and G. Agrawal. Stratified sampling for data mining on the deep web. Frontiers of Computer Science, 6(2):179--196, 2012.

[18]

E. Lo, K. Y. Yip, K.-I. Lin, and D. W. Cheung. Progressive skylining over web-accessible databases. Data & Knowledge Engineering, 2006.

Digital Library

[19]

J. Madhavan, D. Ko, Ł. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy. Google's deep web crawl. VLDB, 2008.

Digital Library

[20]

D. Papadias, Y. Tao, G. Fu, and B. Seeger. An optimal and progressive algorithm for skyline queries. In SIGMOD, 2003.

Digital Library

[21]

J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, 2007.

Digital Library

[22]

S. Raghavan and H. Garcia-Molina. Crawling the hidden web. VLDB, 2000.

Digital Library

[23]

C. Sheng, N. Zhang, Y. Tao, and X. Jin. Optimal algorithms for crawling a hidden database in the web. VLDB, 2012.

Digital Library

[24]

K.-L. Tan, P.-K. Eng, B. C. Ooi, et al. Efficient progressive skyline computation. In VLDB, 2001.

Digital Library

[25]

F. Wang and G. Agrawal. Effective and efficient sampling methods for deep web aggregation queries. In EDBT, 2011.

Digital Library

[26]

M. L. Yiu and N. Mamoulis. Efficient processing of top-k dominating queries on multi-dimensional data. In VLDB, 2007.

Digital Library

[27]

N. Zhang, C. Li, N. Hassan, S. Rajasekaran, and G. Das. On skyline groups. TKDE, 26(4), 2014.

Cited By

Chang JCui BNargesian FAsudeh AJagadish H(2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00849-w
Asudeh ADas GJagadish HLu SNazi ATao YZhang NZhao J(2022)On Finding Rank Regret RepresentativesACM Transactions on Database Systems10.1145/353105447:3(1-37)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3531054
Nargesian FAsudeh AJagadish H(2021)Tailoring data source distributions for fairness-aware data integrationProceedings of the VLDB Endowment10.14778/3476249.347629914:11(2519-2532)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476299
Show More Cited By

Recommendations

Reverse skyline search in uncertain databases

Reverse skyline queries over uncertain databases have many important applications such as sensor data monitoring and business planning. Due to the wide existence of uncertainty in many real-world data, answering reverse skyline queries accurately and ...
Selecting skyline stars over uncertain databases

Graphical abstractDisplay Omitted HighlightsFirst, we present the evidential skyline denoted by the b-skyline, which aims at returning the evidential objects that are not credibly dominated by any other object.We then introduce the plausible skyline ...
U-Skyline: A New Skyline Query for Uncertain Databases

The skyline query, aiming at identifying a set of skyline tuples that are not dominated by any other tuple, is particularly useful for multicriteria data analysis and decision making. For uncertain databases, a probabilistic skyline query, called P-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 9, Issue 7

March 2016

96 pages

ISSN:2150-8097

Editors:
Surajit Chaudhuri
Microsoft Research
,
Jayant Haritsa
Bangalore

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2016

Published in PVLDB Volume 9, Issue 7

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
105
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang JCui BNargesian FAsudeh AJagadish H(2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00849-w
Asudeh ADas GJagadish HLu SNazi ATao YZhang NZhao J(2022)On Finding Rank Regret RepresentativesACM Transactions on Database Systems10.1145/353105447:3(1-37)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3531054
Nargesian FAsudeh AJagadish H(2021)Tailoring data source distributions for fairness-aware data integrationProceedings of the VLDB Endowment10.14778/3476249.347629914:11(2519-2532)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476299
Murugudu MReddy L(2021)RETRACTED ARTICLE: Efficiently harvesting deep web interfaces based on adaptive learning using two-phase data crawler frameworkSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-05816-z27:1(505-515)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1007/s00500-021-05816-z
Li RQin LYe FWang GYu JXiao XXiao NZheng Z(2020)Finding skyline communities in multi-valued networksThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00618-529:6(1407-1432)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1007/s00778-020-00618-5
Shetiya SAsudeh AAhmed SDas G(2019)A unified optimization algorithm for solving "regret-minimizing representative" problemsProceedings of the VLDB Endowment10.14778/3368289.336829113:3(239-251)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.14778/3368289.3368291
Asudeh ANazi AZhang NDas GJagadish HBoncz PManegold SAilamaki ADeshpande AKraska T(2019)RRRProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300080(263-280)Online publication date: 25-Jun-2019
https://dl.acm.org/doi/10.1145/3299869.3300080
Asudeh AJagadish HStoyanovich JDas GBoncz PManegold SAilamaki ADeshpande AKraska T(2019)Designing Fair Ranking SchemesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300079(1259-1276)Online publication date: 25-Jun-2019
https://dl.acm.org/doi/10.1145/3299869.3300079
Hernández IRivero CRuiz D(2019)Deep Web crawlingWorld Wide Web10.1007/s11280-018-0602-122:4(1577-1610)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s11280-018-0602-1
Han XWang BLi JGao H(2019)Ranking the big skyKnowledge and Information Systems10.1007/s10115-018-1256-060:1(415-446)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s10115-018-1256-0
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents