Formal concept analysis approach for data extraction from a limited deep web database

Zhuo Zhang¹,
Juan Du² &
Liming Wang¹

471 Accesses
Explore all metrics

Abstract

Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context K _L, which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from K _L. Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Accessing the Deep Web with Keywords: A Foundational Approach

Bit-Close: a fast incremental concept calculation method

Article 19 February 2024

FCA-Based Ontology Learning from Unstructured Textual Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The DBLP Computer Science Bibliography. http://www.informatik.uni-trier.de/~ley/db/index.html, November, 2011.
http://web-harvest.sourceforge.net.
http://archive.ics.uci.edu/ml/datasets/Car+Evaluation.

References

Barbosa, L., & Freire, J. (2004). Siphoning hidden-web data through keyword-based interfaces. In SBBD.
Carpineto, C., & Romano, G. (2004). Exploiting the potential of concept lattices for information retrieval with CREDO. Journal of Universal Computer Science, 10(8), 985–1013.
MATH Google Scholar
Chang, K., He, B., Zhang, Z. (2005). Toward large scale integration: Building a metaquerier over databases on the web. In Proceedings of CIDR 2005 (pp. 44–55).
Chen, K., Zuo, W., Zhang, F., He, F., Chen, Y. (2011). Robust and efficient annotation based on ontology evolution for deep web data. Journal of Computers, 6(10), 2029–2036.
Article Google Scholar
Dasgupta, A., Zhang, N., Das, G. (2009). Leveraging count information in sampling hidden databases. In Proceedings of the 25th International Conference on Data Engineering. ICDE’09. IEEE (pp. 329–340).
Dau, F., Ducrou, J., Eklund, P. (2008). Concept similarity and related categories in SearchSleuth. Lecture Notes in Computer Science, 5113, 255–268.
Article Google Scholar
Du, Y., & Hai, Y. (2012). Semantic ranking of web pages based on formal concept analysis. Journal of Systems and Software, 86(1), 187–197. doi:10.1016/j.jss.2012.07.040.
Article Google Scholar
Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C. (2012). Opal: automated form understanding for the deep web. In Proceedings of the 21st international conference on World Wide Web (pp. 829–838).
Hong, J.L. (2011). Data extraction for deep web using wordnet. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 41(6), 854–868.
Article Google Scholar
Huang, Q., Li, Q., Li, H., Yan, Z. (2012). An approach to incremental deep web crawling based on incremental harvest model. Procedia Engineering, 29, 1081–1087.
Article Google Scholar
Jiang, L., Wu, Z., Feng, Q., Liu, J., Zheng, Q. (2010). Efficient deep web crawling using reinforcement learning. Lecture Notes in Computer Science, 6118, 428–439.
Article Google Scholar
Koester, B. (2006). Conceptual knowledge retrieval with FooCA: improving web search engine results with contexts and concept hierarchies. Lecture Notes in Computer Science, 4065, 176–190.
Article Google Scholar
Li, Y., Wang, Y., Du, J. (2012). E-ffc: an enhanced form-focused crawler for domain-specific deep web databases. Journal of Intelligent Information Systems, 40(1), 159–184.
Article Google Scholar
Lindig, C. (2000). Fast concept analysis. Working with conceptual structures—contributions to ICCS 2000 (pp. 235–248).
Liu, W., Meng, X., Meng, W. (2010). Vide: a vision-based approach for deep web data extraction. IEEE Transactions on Knowledge and Data Engineering, 22(3), 447–460.
Article Google Scholar
Madhavan, J., Ko, D., Kot, Ł., Ganapathy, V., Rasmussen, A., Halevy, A. (2008). Google’s deep web crawl. Proceedings of the VLDB Endowment, 1(2), 1241–1252.
Google Scholar
Palekar, V.R., Ali, M.S., Meghe, R. (2012). Deep web data extraction using web-programminglanguage-independent approach. Journal of Data Mining and Knowledge Discovery, 3(2), 69–73. http://www.bioinfo.in/journalcontent.php?vol_id=905&id=42&month=4&year=2012.
Google Scholar
Polaillon, G., Aufaure, M., Le Grand, B., Soto, M. (2007). FCA for contextual semantic navigation and information retrieval in heterogeneous information systems. In DEXA’07. 18th international workshop on database and expert systems applications (pp. 534–539). IEEE.
Wang, Y., Lu, J., Chen, J. (2009). Crawling deep web using a new set covering algorithm. In Proceedings of the 5th International Conference on Advanced Data Mining and Applications. ADMA 2009, Chengdu, China (pp. 326–337). Springer.
Wang, Y., Lu, J., Liang, J., Chen, J., Liu, J. (2012). Selecting queries from sample to crawl deep web data sources. Web Intelligence and Agent Systems, 10(1), 75–88.
Article Google Scholar
Wille, R. (1999). Formal concept analysis: Mathematical foundations. Springer.
Wu, P., Wen, J., Liu, H., Ma, W. (2006). Query selection techniques for efficient crawling of structured web sources. In ICDE’06. Proceedings of the 22nd international conference on data engineering (pp. 47–47). IEEE.
Yang, Y., Du, Y., Sun, J., Hai, Y. (2008). A topic-specific web crawler with concept similarity context graph based on fca. In D.-S. Huang, D. Wunsch, D. Levine, K.-H. Jo (Eds.), Advanced intelligent computing theories and applications. With aspects of artificial intelligence (Vol. 5227, p. 840). Berlin/Heidelberg: Springer. doi:10.1007/978-3-540-85984-0-101.

Download references

Acknowledgements

I thank Dr. Lin Gan at the School of Computing in Wuhan University for providing helpful suggestions. I also thank the anonymous referees and editor for their constructive comments on earlier versions of the paper.

Author information

Authors and Affiliations

School of Information Engineering, ZhengZhou University, 450001, ZhengZhou, China
Zhuo Zhang & Liming Wang
Information Technology Engineering, Yellow River Conservancy Technical Institute, 475003, Kaifeng, China
Juan Du

Authors

Zhuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Du
View author publications
You can also search for this author in PubMed Google Scholar
Liming Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuo Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Du, J. & Wang, L. Formal concept analysis approach for data extraction from a limited deep web database. J Intell Inf Syst 41, 211–234 (2013). https://doi.org/10.1007/s10844-013-0242-y

Download citation

Received: 03 July 2012
Revised: 21 November 2012
Accepted: 05 March 2013
Published: 24 March 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10844-013-0242-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accessing the Deep Web with Keywords: A Foundational Approach

Bit-Close: a fast incremental concept calculation method

FCA-Based Ontology Learning from Unstructured Textual Data

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now