On Discovering Concept Entities from Web Sites

Ming Yin²⁴,
Dion Hoe-Lian Goh²⁴ &
Ee-Peng Lim²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3481))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1716 Accesses

Abstract

A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Resolving Entity on A Large scale: DEtermining Linked Entities and Grouping similar Attributes represented in assorted TErminologies

Article 09 September 2017

A Combined Approach for Ontology Enrichment from Textual and Open Data

Entity Extraction from Wikipedia List Pages

References

Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the 1998 ACM SIGMOD, Seattle, Washington, USA, June 2-4, pp. 307–318 (1998)
Google Scholar
Craven, M., Slattery, S.: Relational learning with statistical predicate invention: Better models for hypertext. Journal of Machine Learning 43(1-2), 97–119 (2001)
Article MATH Google Scholar
Ester, M., Kriegel, H.-P., Schubert, M.: Web site mining: a new way to spot competitors, customers and suppliers in the world wide web. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, July 23 - 26, pp. 249–258 (2002)
Google Scholar
Furnkranz, J.: Hyperlink ensembles: A case study in hypertext classification. Journal of Information Fusion 1, 299–312 (2001)
Google Scholar
Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext catergorization method using links and incrementally available class information. In: Proceedings of the 23rd ACM SIGIR, Athens, Greece, July 24-28, pp. 264–271 (2000)
Google Scholar
Sun, A., Lim, E.-P.: Web unit mining: finding and classifying subgraphs of web pages. In: Proceedings of the 12th CIKM, McLean, Virginia, USA, November 4-9, pp. 108–115 (2002)
Google Scholar
Terveen, L., Hill, W., Amento, B.: Constructing, organizing, and visualizing collections of topically related web resources. ACM Transactions on Computer- Human Interaction 6(1), 67–94 (1999)
Article Google Scholar
Tian, Y., Huang, T., Gao, W., Cheng, J., Kang, P.: Two-phase web site classification based on hidden markov tree models. In: Proceedings of IEEE/WIC Web Intelligence, Beijing, China, October 13-17 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Information Studies, School of Communication and Information, Nanyang Technological University, 639798, Singapore
Ming Yin & Dion Hoe-Lian Goh
Centre for Advanced Information Systems, School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Ee-Peng Lim

Authors

Ming Yin
View author publications
You can also search for this author in PubMed Google Scholar
Dion Hoe-Lian Goh
View author publications
You can also search for this author in PubMed Google Scholar
Ee-Peng Lim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, P.O. Box, I-06123, Perugia, Italy
Antonio Laganà
Institute of High Performance Computing, IHCP, 1 Science Park Road, 01-01 The Capricorn, Singapore Science Park II, 117528, Singapore
Heow Pueh Lee
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
OptimaNumerics Ltd, P.O. Box, Belfast, United Kingdom
Chih Jeng Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, M., Goh, D.HL., Lim, EP. (2005). On Discovering Concept Entities from Web Sites. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424826_125

Download citation

DOI: https://doi.org/10.1007/11424826_125
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25861-2
Online ISBN: 978-3-540-32044-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics