[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/11605126_5guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Specific-Purpose web searches on the basis of structure and contents

Published: 01 May 2005 Publication History

Abstract

We introduce methods for two specific-purpose Web searches. One is a search for Web communities related to given keywords, and the other is a search for texts having a certain relation to given keywords. Our methods are based on both structure and contents of WWW. Our method of Web community search uses global structure of WWW to discover communities, and uses content information to label found communities, where global structure means Web graph composed of Web pages and hyperlinks between them. On the other hand, our method of related text search uses local structure of WWW to extract candidate texts, and uses content information to filter out wrongly extracted ones, where local structure means DOM-tree structure of each page. We report the latest results on these Web search methods.

References

[1]
R. Agrawal and R. Srikant. First algorithms for mining association rules. In Proc. 20th Int'l Conf. on VLDB, pages 487-499, 1994.
[2]
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 11th Int'l Conf. on Data Eng., pages 3-14, 1995.
[3]
R. Baeza-Yates and B. Ribriro-Neto. Modern Information Retrieval. ACM Press, New York, NY, 1999.
[4]
W. W. Cohen, M. Hurst, and L. S. Jensen. A flexible learning system for wrapping tables and lists in html documents. In Proc. of 11th Int'l World Wide Web Conf., pages 232-241, 2002.
[5]
G. Flake, S. Lawrence, and C.Giles. Efficient identification of web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 150-160, 2000.
[6]
G. Flake, R. Tarjan, and K. Tsioutsiouliklis. Graph clustering and mining cut trees. Internet Mathematics, 1(3):355-378, 2004.
[7]
M. Girvan and M. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, 99:7821-7826, 2002.
[8]
H. Hasagawa, M. Kudo, and A. Nakamura. Empirical study on usefulness of algorithm sacwrapper for reputation extraction from the www. In Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems, 2005. To appear.
[9]
H. Hasagawa, M. Kudo, and A. Nakamura. Reputation extraction using both structural and content information. Technical Report TCS-TR-A-05- 2, Division of Computer Science, Hokkaido university, 2005. http://wwwalg. ist.hokudai.ac.jp/tra.html.
[10]
D. Ikeda, Y. Yamada, and S. Hirokawa. Expressive power of tree and string based wrappers. In Proc. of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pages 21-26, 2003.
[11]
H. Ino, M. Kudo, and A. Nakamura. Partitioning of web graphs by community topology. In Proceedings of WWW2005, pages 661-669, 2005.
[12]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, 1999.
[13]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 31(11-16):1481-1493, 1999.
[14]
N. Kushmerick. Wrapper induction:efficiency and expressiveness. Artificial Intelligence, 118:15-68, 2000.
[15]
R. Mitton. A description of a computer-usable dictionary file based on the oxford advanced learner's dictionary of current english, June 1992. Downloaded from ftp://sable.ox.ac.uk/pub/ota/public/dicts/710/.
[16]
Y. Murakami, H. Sakamoto, H. Arimura, and S. Arikawa. Extracting text data from html documents. The Information Processing Society of Japan (IPSJ) Transactions on Mathematical Modeling and its Applications (TOM), 42(SIG 14(TOM 5)):39- 49, 2001. In Japanese.
[17]
A. Nakamura, T. Shigezumi, and M. Yamamoto. On nk-community problem. In Proceedings of the Winter LA Symposium, pages 12.1-12.8, 2005.
[18]
T. Sugibuchi and Y. Tanaka. Interactive web-wrapper construction for extracting relational information from web documents. In Proceedings of WWW2005, pages 968-969, 2005.
[19]
R. Tarjan. Data Structure and Network Algorithm. Society for Industrial and Applied Mathematics, 1983.
[20]
K. Tateishi, Y. Ishiguro, and T. Fukushima. A reputation search engine that collects people's opinions by information extraction technology. The Information Processing Society of Japan (IPSJ) Transactions on Databases (TOD), 45(SIG 07), 2004. In Japanese.
[21]
T. Uno, T. Asai, Y. Uchida, and H. Arimura. Efficient mining algorithms for frequent/closed/maximal itemsets. In Proceedings of FIMI04, 2004.
[22]
M. J. Zaki. Efficiently mining frequent trees in a forest. In Proc. SIGKDD'02, pages 71-80, 2002.
  1. Specific-Purpose web searches on the basis of structure and contents

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Proceedings of the 2005 international conference on Federation over the Web
      May 2005
      214 pages
      ISBN:3540310185
      • Editors:
      • Klaus P. Jantke,
      • Aran Lunzer,
      • Nicolas Spyratos,
      • Yuzuru Tanaka

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 May 2005

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media