Deep Web crawling: a survey

3075 Accesses
22 Citations
6 Altmetric
Explore all metrics

Abstract

Deep Web crawling refers to the problem of traversing the collection of pages in a deep Web site, which are dynamically generated in response to a particular query that is submitted using a search form. To achieve this, crawlers need to be endowed with some features that go beyond merely following links, such as the ability to automatically discover search forms that are entry points to the deep Web, fill in such forms, and follow certain paths to reach the deep Web pages with relevant information. Current surveys that analyse the state of the art in deep Web crawling do not provide a framework that allows comparing the most up-to-date proposals regarding all the different aspects involved in the deep Web crawling process. In this article, we propose a framework that analyses the main features of existing deep Web crawling-related techniques, including the most recent proposals, and provides an overall picture regarding deep Web crawling, including novel features that to the present day had not been analysed by previous surveys. Our main conclusion is that crawler evaluation is an immature research area due to the lack of a standard set of performance measures, or a benchmark or publicly available dataset to evaluate the crawlers. In addition, we conclude that the future work in this area should be focused on devising crawlers to deal with ever-evolving Web technologies and improving the crawling efficiency and scalability, in order to create effective crawlers that can operate in real-world contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning in Deep Web Crawling: Survey

Improving the freshness of the search engines by a probabilistic approach based incremental crawler

Article 15 September 2016

Intelligent Rule-Based Deep Web Crawler

References

Álvarez, M, Raposo, J, Pan, A, Cacheda, F, Bellas, F, Carneiro, V: Crawling the content hidden behind Web forms. In: ICCSA, pp. 322–333 (2007). https://doi.org/10.1007/978-3-540-74477-1_31
Anupam, V., Freire, J., Kumar, B., Lieuwen, D.F.: Automating Web navigation with the WebVCR. Comput. Netw. 33(1-6), 503–517 (2000). https://doi.org/10.1016/S1389-1286(00)00073-6
Article Google Scholar
Asudeh, A., Thirumuruganathan, S., Zhang, N., Das, G.: Discovering the skyline of Web databases. PVLDB 9(7), 600–611 (2016). https://doi.org/10.14778/2904483.2904491
Article Google Scholar
Barbosa, L, Freire, J: Siphoning hidden-Web data through keyword-based interfaces. In: SBBD, pp. 309–321. (2004).
Barbosa, L, Freire, J: Searching for hidden-Web databases. In: WebDB, pp. 1–6 (2005)
Barbosa, L, Freire, J: An adaptive crawler for locating hidden-Web entry points. In: WWW, pp. 441–450 (2007). https://doi.org/10.1145/1242572.1242632
Baumgartner, R, Ceresna, M, Ledermuller, G: Deep Web navigation in Web data extraction. In: CIMCA/IAWTIC, pp. 698–703 (2005). https://doi.org/10.1109/CIMCA.2005.1631550
Bergholz, A, Chidlovskii, B: Crawling for domain-specific hidden Web resources. In: WISE, pp. 125–133 (2003). https://doi.org/10.1109/WISE.2003.1254476
Bergman, M.K.: The deep Web: Surfacing hidden value. J. Electron. Publ. 7, 1 (2001).
Blanco, L, Dalvi, N, Machanavajjhala, A: Highly efficient algorithms for structural clustering of large Webs ites. In: WWW, pp. 437–446 (2011). https://doi.org/10.1145/1963405.1963468
Blythe, J., Kapoor, D., Knoblock, C.A., Lerman, K., Minton, S.: Information integration for the masses. J UCS 14(11), 1811–1837 (2008). https://doi.org/10.3217/jucs-014-11-1811
Article Google Scholar
Bollacker, K, Evans, C, Paritosh, P, Sturge, T, Taylor, J: Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008). https://doi.org/10.1145/1376616.1376746
Calì, A, Martinenghi, D: Querying the deep Web. In: EDBT, pp. 724–727 (2010). https://doi.org/10.1145/1739041.1739138
Caverlee, J, Liu, L, Buttler, D: Probe, cluster, and discover: Focused extraction of qa-pagelets from the deep Web. In: ICDE, pp. 103–114 (2004). https://doi.org/10.1109/ICDE.2004.1319988
Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.M.: Automatic resource compilation by analyzing hyperlink structure and associated text. Comput. Netw. 30(1-7), 65–74 (1998). https://doi.org/10.1016/S0169-7552(98)00087-7
Article Google Scholar
Chang, K.C.C., He, B., Li, C., Patel, M., Zhang, Z.: Structured databases on the Web: Observations and implications. SIGMOD Record 33(3), 61–70 (2004). https://doi.org/10.1145/1031570.1031584
Article Google Scholar
Chang, KCC, He, B, Zhang, Z: Toward large scale integration: Building a metaquerier over databases on the Web. In: CIDR, pp. 44–55. (2005).
Chen, H.: Dark Web: Exploring and data mining the dark side of the Web. Online Inf. Rev. 36(6), 932–933 (2012). https://doi.org/10.1108/14684521211287981
Article Google Scholar
Cho, J., Garcia-Molina, H.: Effective page refresh policies for Web crawlers. ACM Trans. Database Syst 28(4), 390–426 (2003). https://doi.org/10.1145/958942.958945
Article Google Scholar
chromeless: https://github.com/graphcool/chromeless (2018)
Cope, J., Craswell, N., Hawking, D.: Automated discovery of search interfaces on the Web. In: ADC, CRPIT, vol. 17, pp. 181–189 (2003)
Davulcu, H, Freire, J, Kifer, M, Ramakrishnan, IV: A layered architecture for querying dynamic Web content. In: SIGMOD, pp. 491–502 (1999). https://doi.org/10.1145/304182.304225
Devine, J., Egger-Sider, F.: Beyond google: The invisible Web in the academic library. J. Acad. Librarianship 30(4), 265–269 (2004). https://doi.org/10.1016/j.acalib.2004.04.010
Article Google Scholar
Dragut, E.C., Kabisch, T., Yu, C., Leser, U.: A hierarchical approach to model Web query interfaces for Web source integration. PVLDB 2(1), 325–336 (2009). https://doi.org/10.14778/1687627.1687665
Article Google Scholar
Dragut, E.C., Meng, W., Yu, C.T.: Deep Web Query Interface Understanding and Integration. Synthesis Lectures on Data Management. Morgan & Claypool (2012). https://doi.org/10.2200/S00419ED1V01Y201205DTM026
Fetto, J.: Mobile search: Topics and themes. report, Hitwise (2017)
Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C.: The ontological key: Automatically understanding and integrating forms to access the deep Web. VLDBJ 22(5), 615–640 (2013). https://doi.org/10.1007/s00778-013-0323-0
Article Google Scholar
Furche, T., Gottlob, G., Grasso, G., Schallhart, C., Sellers, A.J.: OXPath: A language for scalable data extraction, automation, and crawling on the Deep Web. VLDB J 22(1), 47–72 (2013). https://doi.org/10.1007/s00778-012-0286-6
Article Google Scholar
Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C., Wang, C.: DIADEM: Thousands of Websites to a single database. PVLDB 7 (14), 1845–1856 (2014). https://doi.org/10.14778/2733085.2733091
Article Google Scholar
Green, D.: The evolution of Web searching. Online Inf. Rev. 24(2), 124–137 (2000). https://doi.org/10.1108/14684520010330283
Article Google Scholar
He, B., Patel, M., Zhang, Z., Chang, K.C.C.: Accessing the deep Web: A survey. Commun ACM 50(5), 94–101 (2007). https://doi.org/10.1145/1230819.1241670
Article Google Scholar
He, H, Meng, W, Lu, Y, Yu, CT, Wu, Z: Towards deeper understanding of the search interfaces of the Deep Web. In: WWW, pp. 133–155 (2007). https://doi.org/10.1007/s11280-006-0010-9
He, Y, Xin, D, Ganti, V, Rajaraman, S, Shah, N: Crawling deep Web entity pages. In: WSDM, pp. 355–364 (2013). https://doi.org/10.1145/2433396.2433442
Hernández, I, Rivero, CR, Ruiz, D, Corchuelo, R: Towards discovering conceptual models behind Web sites. In: ER, pp. 166–175 (2012). https://doi.org/10.1007/978-3-642-34002-4_13
Hernández, I, Rivero, C.R., Ruiz, D., Corchuelo, R.: CALA: An unsupervised URL-based Web page classification system. Knowl.-Based Syst. 57(0), 168–180 (2014). https://doi.org/10.1016/j.knosys.2013.12.019
Article Google Scholar
Hicks, C, Scheffer, M, Ngu, AHH, Sheng, QZ: Discovery and cataloging of deep Web sources. In: IRI, pp. 224–230 (2012). https://doi.org/10.1109/IRI.2012.6303014
Holmes, A, Kellogg, M: Automating functional tests using selenium. In: AGILE, pp. 270–275 (2006). https://doi.org/10.1109/AGILE.2006.19
HTTPUnit: http://httpunit.sourceforge.net/ (2016)
iMacros: http://imacros.net/ (2016)
Jamil, HM, Jagadish, HV: A structured query model for the deep relational Web. In: CIKM, pp. 1679–1682 (2015). https://doi.org/10.1145/2806416.2806589
Jiang, L, Wu, Z, Feng, Q, Liu, J, Zheng, Q: Efficient deep Web crawling using reinforcement learning. In: PAKDD, pp. 428–439 (2010). https://doi.org/10.1007/978-3-642-13657-3_46
Jiménez, P, Corchuelo, R.: Roller: A novel approach to Web information extraction. Knowl. Inf. Syst., 1–45 (2016). https://doi.org/10.1007/s10115-016-0921-4
Jin, X, Mone, A, Zhang, N, Das, G: Mobies: Mobile-interface enhancement service for hidden Web database. In: SIGMOD, pp. 1263–1266 (2011). https://doi.org/10.1145/1989323.1989471
Jin, X, Zhang, N, Das, G: Attribute domain discovery for hidden Web databases. In: SIGMOD, pp. 553–564 (2011). https://doi.org/10.1145/1989323.1989381
Kabisch, T., Dragut, E.C., Yu, C.T., Leser, U.: Deep Web integration with visQI. PVLDB 3(2), 1613–1616 (2010). https://doi.org/10.14778/1920841.1921053
Article Google Scholar
Kantorski, GZ, Moraes, TG, Moreira, VP, Heuser, CA: Advances in Databases and Information Systems, pp 125–136. Springer, Berlin (2013). Chap Choosing Values for Text Fields in Web Forms
Book Google Scholar
Kantorski, G.Z., Moreira, V.P., Heuser, C.A.: Automatic filling of hidden Web forms: A survey. SIGMOD Rec 44(1), 24–35 (2015). https://doi.org/10.1145/2783888.2783898
Article Google Scholar
Kautz, H.A., Selman, B., Shah, M.A.: The hidden Web. AI Mag 18(2), 27–36 (1997). https://doi.org/10.1609/aimag.v18i2.1291
Article Google Scholar
Khare, R, An, Y, Song, IY: Understanding deep Web search interfaces: A survey. SIGMOD Rec. 39(1), 33–40 (2010). https://doi.acm.org/10.1145/1860702.1860708
Article Google Scholar
Kumar, M, Bhatia, R: Design of a mobile Web crawler for hidden Web. In: RAIT, pp. 186–190 (2016)
Kushmerick, N: Learning to invoke Web forms. In: CoopIS, pp. 997–1013 (2003). https://doi.org/10.1007/978-3-540-39964-3_63
Kushmerick, N, Thomas, B: Adaptive information extraction: Core technologies for information agents. In: Intelligent Information Agents - The AgentLink Perspective, pp. 79–103 (2003). https://doi.org/10.1007/3-540-36561-3_4
Lage, J.P., da Silva, A.S., Golgher, P.B., Laender, A.H.F.: Automatic generation of agents for collecting hidden Web pages for data extraction. Data Knowl Eng 49(2), 177–196 (2004). https://doi.org/10.1016/j.datak.2003.10.003
Article Google Scholar
Li, Y., Wang, Y., Du, J.: E-FFC: An enhanced form-focused crawler for domain-specific deep Web databases. J Intell Inf Syst 40(1), 159–184 (2013). https://doi.org/10.1007/s10844-012-0221-8
Article Google Scholar
Liakos, P, Ntoulas, A: Topic-sensitive hidden-Web crawling. In: WISE, pp. 538–551 (2012). https://doi.org/10.1007/978-3-642-35063-4_39
Liddle, SW, Embley, DW, Scott, DT, Yau, SH: Extracting data behind Web forms. In: Workshop on Conceptual Modeling Approaches for e-Business, pp. 402–413 (2002). https://doi.org/10.1007/b12013
Losada, J., Raposo, J., Pan, A., Montoto, P.: Efficient execution of Web navigation sequences. WWWJ 17(5), 921–947 (2014). https://doi.org/10.1007/s11280-013-0259-8
Article Google Scholar
Madhavan, J, Jeffery, SR, Cohen, S, Dong, XL, Ko, D, Yu, C, Halevy, A: Web-scale data integration: You can only afford to pay as you go. In: CIDR, pp. 342–350 (2007)
Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.Y.: Google’s deep Web crawl. PVLDB 1(2), 1241–1252 (2008). https://doi.org/10.14778/1454159.1454163
Article Google Scholar
Madhavan, J., Afanasiev, L., Antova, L., Halevy, A.Y.: Harnessing the deep Web: present and future. Syst. Res. 2(2), 50–54 (2009).
Manvi, Dixit, A, Bhatia, KK: Design of an ontology based adaptive crawler for hidden Web. In: CSNT, pp. 659–663 (2013). https://doi.org/10.1109/CSNT.2013.140
Mccoy, D, Bauer, K, Grunwald, D, Kohno, T, Sicker, D: Shining light in dark places: Understanding the tor network. In: PETS, pp. 63–76 (2008). https://doi.org/10.1007/978-3-540-70630-4_5
Meng, X, Hu, D, Li, C: Schema-guided wrapper maintenance for Web-data extraction. In: WIDM, pp. 1–8 (2003). https://doi.org/10.1145/956699.956701
Modica, GA, Gal, A, Jamil, HM: The use of machine-generated ontologies in dynamic information seeking. In: CoopIS, pp. 433–448 (2001). https://doi.org/10.1007/3-540-44751-2_32
Montoto, P, Pan, A, Raposo, J, Bellas, F, Lopez, J: Web navigation sequences automation in modern Websites. In: DEXA, pp. 302–316 (2009). https://doi.org/10.1007/978-3-642-03573-9_25
Nazi, A, Asudeh, A, Das, G, Zhang, N, Jaoua, A: Mobiface: A mobile application for faceted search over hidden Web databases. In: ICCA, pp. 13–17 (2017). https://doi.org/10.1109/COMAPP.2017.8079749
Nguyen, H., Nguyen, T., Freire, J.: Learning to extract form labels. PVLDB 1(1), 684–694 (2008). https://doi.org/10.14778/1453856.1453931
Article Google Scholar
nightwatch: http://nightwatchjs.org/ (2018)
Ntoulas, A, Zerfos, P, Cho, J: Downloading textual hidden Web content through keyword queries. In: JCDL, pp. 100–109 (2005). https://doi.org/10.1145/1065385.1065407
Olston, C., Najork, M.: Web crawling. Found. Trends Inf. Retriev. 4(3), 175–246 (2010). https://doi.org/10.1561/1500000017
Article MATH Google Scholar
Olston, C, Pandey, S: Recrawl scheduling based on information longevity. In: WWW, pp. 437–446 (2008). https://doi.org/10.1145/1367497.1367557
Pan, A, Raposo, J, Álvarez, M, Hidalgo, J, Viña, Á: Semi-automatic wrapper generation for commercial Web sources. In: EISIC, pp. 265–283 (2002). https://doi.org/10.1007/978-0-387-35614-3_16
Pandey, S, Olston, C: User-centric Web crawling. In: WWW, pp. 401–411. https://doi.org/10.1145/1060745.1060805 (2005)
phantomjs.org: http://phantomjs.org/ (2018)
Raghavan, S, Garcia-Molina, H: Crawling the hidden Web. In: VLDB, pp. 129–138 (2001)
Ru, Y., Horowitz, E.: Indexing the invisible Web: a survey. Online Inf. Rev. 29(3), 249–265 (2005). https://doi.org/10.1108/14684520510607579
Article Google Scholar
Schulz, A, Lässig, J, Gaedke, M: Practical Web data extraction: are we there yet? - a short survey. In: WI, pp. 562–567 (2016). https://doi.org/10.1109/WI.2016.0096
Scrapy: http://scrapy.org/ (2016)
Settles, B.: Active learning. Synthesis Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012). https://doi.org/10.2200/S00429ED1V01Y201207AIM018
Article MathSciNet MATH Google Scholar
Sheng, C., Zhang, N., Tao, Y., Jin, X.: Optimal algorithms for crawling a hidden database in the Web. PVLDB 5(11), 1112–1123 (2012). https://doi.org/10.14778/2350229.2350232
Article Google Scholar
Shu, L, Meng, W, He, H, Yu, CT: Querying capability modeling and construction of deep Web sources. In: WISE, pp. 13–25 (2007). https://doi.org/10.1007/978-3-540-76993-4_2
Sleiman, H.A., Corchuelo, R.: A survey on region extractors from Web documents. TKDE 25(9), 1960–1981 (2013). https://doi.org/10.1109/TKDE.2012.135
Article Google Scholar
Sleiman, H.A., Corchuelo, R.: Trinity: On using trinary trees for unsupervised Web data extraction. IEEE Trans Knowl Data Eng 26(6), 1544–1556 (2014). https://doi.org/10.1109/TKDE.2013.161
Article Google Scholar
Srinivasan, P., Menczer, F., Pant, G.: A general evaluation framework for topical crawlers. Inf. Retr. 8(3), 417–447 (2005). https://doi.org/10.1007/s10791-005-6993-5
Article Google Scholar
Statista: Mobile internet usage worldwide. Report (2018)
Su, W., Wu, H., Li, Y., Zhao, J., Lochovsky, F.H., Cai, H., Huang, T.: Understanding query interfaces by statistical parsing. ACM Trans Web 7(2), 8,1–8,22 (2013). https://doi.org/10.1145/2460383.2460387
Article Google Scholar
Su, W, Li, Y, Lochovsky, FH: Query interfaces understanding by statistical parsing. In: WWW, pp. 1291–1294 (2014). https://doi.org/10.1145/2567948.2579702
Toda, G.A., Cortez, E., da Silva, A.S., de Moura, E.: A probabilistic approach for automatically filling form-based Web interfaces. PVLDB 4(3), 151–160 (2010). https://doi.org/10.14778/1929861.1929862
Article Google Scholar
Vidal, M.L.A., da Silva, A.S., de Moura, E.S., Cavalcanti, J.M.B.: Structure-based crawling in the Hidden Web. J UCS 14(11), 1857–1876 (2008)
Google Scholar
Vieira, K., Barbosa, L., Silva, A.S., Freire, J., Moura, E.: Finding seeds to bootstrap focused crawlers. World Wide Web, 1–26 (2015). https://doi.org/10.1007/s11280-015-0331-7
Wang, Y, Lu, J, Chen, J: Crawling deep Web using a new set covering algorithm. In: ADMA, pp. 326–337 (2009). https://doi.org/10.1007/978-3-642-03348-3_32
Watij.com: http://watij.com/ (2016)
Watin.org: http://watin.org/ (2016)
Watir.com: http://watir.com/ (2016)
Weninger, T., Palȧcios, R, Crescenzi, V., Gottron, T., Merialdo, P.: Web content extraction: A metaanalysis of its past and thoughts on its future. SIGKDD Explorations 17(2), 17–23 (2015). https://doi.org/10.1145/2897350.2897353
Article Google Scholar
Wu, Z, Raghavan, V, Qian, H, Rama, KV, Meng, W, He, H, Yu, C: Towards automatic incorporation of search engines into a large-scale metasearch engine. In: WI, pp. 658–661 (2003). https://doi.org/10.1109/WI.2003.1241290
Wu, P, Wen, JR, Liu, H, Ma, WY: Query selection techniques for efficient crawling of structured Web sources. In: ICDE, pp. 47–56 (2006). https://doi.org/10.1109/ICDE.2006.124
Wu, W, Doan, A, Yu, C, Meng, W: Modeling and extracting deep-Web query interfaces, pp. 65–90 (2009). https://doi.org/10.1007/978-3-642-04141-9_4
Wu, W, Zhong, T: Searching the deep Web using proactive phrase queries. In: WWW Companion, pp. 137–138 (2013). https://doi.org/10.1145/2487788.2487854
Wu, W., Meng, W., Su, W., Zhou, G., Chiang, Y.Y.: Q2p: discovering query templates via autocompletion. ACM Trans Web 10(2), 10,1–10,29 (2016). https://doi.org/10.1145/2873061
Article Google Scholar
Xu, S., Yoon, H.J., Tourassi, G.: A user-oriented Web crawler for selectively acquiring online content in e-health research. Bioinformatics 30(1), 104–114 (2014). https://doi.org/10.1093/bioinformatics/btt571
Article Google Scholar
Yan, H., Gong, Z., Zhang, N., Huang, T., Zhong, H., Wei, J.: Aggregate estimation in hidden databases with checkbox interfaces. TKDE 27(5), 1192–1204 (2015). https://doi.org/10.1109/TKDE.2014.2365800
Article Google Scholar
Zhang, Z, He, B, Chang, KCC: Understanding Web query interfaces: Best-effort parsing with hidden syntax. In: SIGMOD, pp. 107–118 (2004). https://doi.org/10.1145/1007568.1007583
Zhao, J, Wang, P: Nautilus: a generic framework for crawling Deep Web. In: ICDKE, pp. 141–151 (2012). https://doi.org/10.1007/978-3-642-34679-8_14
Zhao, F., Zhou, J., Nie, C., Huang, H., Jin, H.: Smartcrawler: a two-stage crawler for efficiently harvesting deep-Web interfaces. IEEE Trans Serv. Comput. 9 (4), 608–620 (2016). https://doi.org/10.1109/TSC.2015.2414931
Article Google Scholar
Zheng, Q., Wu, Z., Cheng, X., Jiang, L., Liu, J.: Learning to crawl deep Web. Inf. Syst. 38(6), 801–819 (2013). https://doi.org/10.1016/j.is.2013.02.001
Article Google Scholar
Zhou, X, Belkin, M: Chapter 22 - semi-supervised learning. In: Academic Press Library in Signal Processing: Volume 1, Academic Press Library in Signal Processing, vol 1, pp. 1239–1269. Elsevier (2014). https://doi.org/10.1016/B978-0-12-396502-8.00022-X
zombiejs.org: http://zombie.js.org/ (2018)

Download references

Acknowledgements

The authors would like to thank Dr. Rafael Corchuelo for his support and assistance throughout the entire research process that led to this article, and for his helpful and constructive comments that greatly contributed to improving the article. They would also like to thank the anonymous reviewers of this and past submissions, since their comments have contributed to give shape to this current version. Supported by the European Commission (FEDER), the Spanish and the Andalusian R &D & I programmes (grants TIN2016-75394-R, and TIN2013-40848-R).

Author information

Authors and Affiliations

Department of Languages and Computer Systems, University of Seville, Seville, Spain
Inma Hernández & David Ruiz
Department of Computer Science, Rochester Institute of Technology, Rochester, NY, USA
Carlos R. Rivero

Authors

Inma Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Carlos R. Rivero
View author publications
You can also search for this author in PubMed Google Scholar
David Ruiz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inma Hernández.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández, I., Rivero, C.R. & Ruiz, D. Deep Web crawling: a survey. World Wide Web 22, 1577–1610 (2019). https://doi.org/10.1007/s11280-018-0602-1

Download citation

Received: 22 May 2017
Revised: 15 May 2018
Accepted: 25 May 2018
Published: 05 June 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11280-018-0602-1

Deep Web crawling: a survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning in Deep Web Crawling: Survey

Improving the freshness of the search engines by a probabilistic approach based incremental crawler

Intelligent Rule-Based Deep Web Crawler

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep Web crawling: a survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning in Deep Web Crawling: Survey

Improving the freshness of the search engines by a probabilistic approach based incremental crawler

Intelligent Rule-Based Deep Web Crawler

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation