More Web Proxy on the site http://driver.im/

Article

Specific-Purpose web searches on the basis of structure and contents

Authors:

Atsuyoshi NakamuraAuthors Info & Claims

Proceedings of the 2005 international conference on Federation over the Web

Pages 79 - 96

https://doi.org/10.1007/11605126_5

Published: 01 May 2005 Publication History

Abstract

We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.

References

[1]

R. Agrawal and R. Srikant. First algorithms for mining association rules. In Proc. 20th Int'l Conf. on VLDB, pages 487-499, 1994.

[2]

R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 11th Int'l Conf. on Data Eng., pages 3-14, 1995.

[3]

R. Baeza-Yates and B. Ribriro-Neto. Modern Information Retrieval. ACM Press, New York, NY, 1999.

[4]

W. W. Cohen, M. Hurst, and L. S. Jensen. A flexible learning system for wrapping tables and lists in html documents. In Proc. of 11th Int'l World Wide Web Conf., pages 232-241, 2002.

[5]

G. Flake, S. Lawrence, and C.Giles. Efficient identification of web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 150-160, 2000.

[6]

G. Flake, R. Tarjan, and K. Tsioutsiouliklis. Graph clustering and mining cut trees. Internet Mathematics, 1(3):355-378, 2004.

[7]

M. Girvan and M. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, 99:7821-7826, 2002.

[8]

H. Hasagawa, M. Kudo, and A. Nakamura. Empirical study on usefulness of algorithm sacwrapper for reputation extraction from the www. In Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems, 2005. To appear.

[9]

H. Hasagawa, M. Kudo, and A. Nakamura. Reputation extraction using both structural and content information. Technical Report TCS-TR-A-05- 2, Division of Computer Science, Hokkaido university, 2005. http://wwwalg. ist.hokudai.ac.jp/tra.html.

[10]

D. Ikeda, Y. Yamada, and S. Hirokawa. Expressive power of tree and string based wrappers. In Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pages 21-26, 2003.

[11]

H. Ino, M. Kudo, and A. Nakamura. Partitioning of web graphs by community topology. In Proceedings of WWW2005, pages 661-669, 2005.

[12]

J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, 1999.

[13]

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 31(11-16):1481-1493, 1999.

[14]

N. Kushmerick. Wrapper induction:efficiency and expressiveness. Artificial Intelligence, 118:15-68, 2000.

[15]

R. Mitton. A description of a computer-usable dictionary file based on the oxford advanced learner's dictionary of current english, June 1992. Downloaded from ftp://sable.ox.ac.uk/pub/ota/public/dicts/710/.

[16]

Y. Murakami, H. Sakamoto, H. Arimura, and S. Arikawa. Extracting text data from html documents. The Information Processing Society of Japan (IPSJ) Transactions on Mathematical Modeling and its Applications (TOM), 42(SIG 14(TOM 5)):39- 49, 2001. In Japanese.

[17]

A. Nakamura, T. Shigezumi, and M. Yamamoto. On nk-community problem. In Proceedings of the Winter LA Symposium, pages 12.1-12.8, 2005.

[18]

T. Sugibuchi and Y. Tanaka. Interactive web-wrapper construction for extracting relational information from web documents. In Proceedings of WWW2005, pages 968-969, 2005.

[19]

R. Tarjan. Data Structure and Network Algorithm. Society for Industrial and Applied Mathematics, 1983.

[20]

K. Tateishi, Y. Ishiguro, and T. Fukushima. A reputation search engine that collects people's opinions by information extraction technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD), 45(SIG 07), 2004. In Japanese.

[21]

T. Uno, T. Asai, Y. Uchida, and H. Arimura. Efficient mining algorithms for frequent/closed/maximal itemsets. In Proceedings of FIMI04, 2004.

[22]

M. J. Zaki. Efficiently mining frequent trees in a forest. In Proc. SIGKDD'02, pages 71-80, 2002.

Specific-Purpose web searches on the basis of structure and contents
1. Computing methodologies
2. Information systems

Recommendations

Learnable topic-specific web crawler
Special issue on computational intelligence on the internet

Topic-specific web crawler collects relevant web pages of interested topics from the Internet. There are many previous researches focusing on algorithms of web page crawling. The main purpose of those algorithms is to gather as many relevant web pages ...
A QIIIEP based domain specific hidden web crawler
ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology

For context based surfing of World Wide Web in a systematic and automatic manner, a web crawler is required. The World Wide Web consists interlinked documents and resources that are easily crawled by general web crawler, known as surface web crawler. ...
A Web Mining Architectural Model of Distributed Crawler for Internet Searches Using PageRank Algorithm
APSCC '08: Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference

As the World Wide Web is growing rapidly and data in the present day scenario is stored in a distributed manner. The need to develop a search engine based architectural model for people to search through the Web. Broad web search engines as well as many ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Proceedings of the 2005 international conference on Federation over the Web

May 2005

214 pages

ISBN:3540310185

Editors:
Klaus P. Jantke
Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, Sapporo, Japan
,
Aran Lunzer
Meme Media Laboratory, Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo, Japan
,
Nicolas Spyratos
Laboratoire de Recherche en Informatique, Université Paris-Sud, Kita 13, Nishi 8, Kita-ku, Orsay Cedex, France
,
Yuzuru Tanaka
Meme Media Laboratory, Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo, Japan

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 May 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents