[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1081870.1081954acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Key semantics extraction by dependency tree mining

Published: 21 August 2005 Publication History

Abstract

We propose a new text mining system which extracts characteristic contents from given documents. We define Key semantics as characteristic sub-structures of syntactic dependencies in the given documents, and consider the following three tasks in this paper: 1)Key semantics extraction: extracting characteristic syntactic dependency structures not only as ordered trees but also as unordered trees and free trees, 2)Redundancy reduction: from the result of extraction, deleting redundant dependency structures such as sub-structures or equivalent structures of the others, and 3)Phrase/sentence reconstruction: generating a phrase or sentence in a natural language corresponding to the extracted structure.Our system is a combination of natural language processing techniques and tree mining techniques. The system consists of the following five units: 1) syntactic dependency analysis unit, 2) input filters, 3) characteristic ordered subtree extraction unit, 4) output filters, and 5) phrase/sentence reconstruction unit. Although ordered trees are extracted in the third unit, the overall behavior of the system can be switched into the extraction of ordered trees, unordered trees, or free trees depending on which of the input filters is/are applied in the second step. The output filters delete redundant trees from the extraction result for efficient knowledge discovery. Finally, phrases or sentences corresponding to the extracted subtrees are reconstructed by utilizing the input documents.We demonstrate the validity of our system by showing experimental results using real data collected at a help desk and TDT pilot corpus.

References

[1]
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa: Efficient Substructure Discovery from Large Semi-structured Data, Proc. SDM'02, pp:158--174, SIAM, 2002.]]
[2]
T. Asai, H. Arimura, T. Uno, S. Nakano: Discovering Frequent Substructures in Large Unordered Trees, Proc. DS'03, LNAI 2843, Springer-Verlag, pp:47--61, 2003.]]
[3]
Y. Chi, Y. Yang, and R. R. Muntz: Indexing and mining free trees, Proc. IEEE ICDM'03, pp:509--512, 2003.]]
[4]
A. Inokuchi, T. Washio, H. Motoda: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data, Proc. PKDD 2000, LNAI 1910, Springer-Verlag, pp:13--23, 2000.]]
[5]
D. Klein and C.D. Manning: Fast Exact Inference with a Factored Model for Natural Language Parsing, Proc. NIPS 2002, pp:3--10, 2002.]]
[6]
T. Kudo and Y. Matsumoto: A Boosting Algorithm for Classification of Semi-Structured Text, Proc. of EMNLP, pp:301--308, 2004.]]
[7]
M. Kuramochi, G. Karypis: Frequent Subgraph Discovery, Proc. IEEE ICDM'01, pp:313--320, 2001.]]
[8]
S. Morinaga and K. Yamanishi: Tracking Dynamics of Topic Trends Using a Finite Mixture Model, Proc. KDD2004, pp:811--816, 2004.]]
[9]
S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima: Mining Product Reputations on the Web, Proc. KDD2002, pp:341--349, 2002.]]
[10]
S. Nakano: Efficient generation of plane trees, IPL, vol. 84, pp:167--172, 2002.]]
[11]
S. Nijssen, J. N. Kok: Effcient Discovery of Frequent Unordered Trees, Proc. MGTS'03, pp:55--64, Sep. 2003.]]
[12]
J. Pei, J. Han: Constrained frequent pattern mining: a pattern-growth view, SIGKDD Explorations 4(1), pp:31--39, 2002.]]
[13]
Y. Sakao, T. Ikeda, K. Satoh, and S. Akamine: Japanese Language Analysis for Syntactic Tree Mining to Extract Characteristic Contents, Proc. of the 11th Annual Meeting of the Association for NLP, 2005. (in Japanese).]]
[14]
K. Satoh, T. Ikeda, T. Nakata, and S. Osada: Japanese Processing Middleware for Customer Relationship Management, in Proc. of the 9th Annual Meeting of the Association for NLP, pp.109--112, 2003. (in Japanese).]]
[15]
http://www-nlp.stanford.edu/software/lex-parser.shtml]]
[16]
http://www.nist.gov/speech/tests/tdt/]]
[17]
http://www.topicscope.com/ (in Japanese).]]
[18]
K. Yamanishi: A decision-theoretic extension of stochastic complexity and its applications to learning, IEEE Trans. on Information Theory, vol. 44(4), pp:1424--1439, 1998.]]
[19]
K. Yamanishi and H. Li: Mining open answers in questionnaire data, IEEE Intelligent Systems, Sept/Oct, pp:58--63, 2002.]]
[20]
X. Yan, J. Han: gSpan: Graph-Based Substructure Pattern Mining, Proc. IEEE ICDM'02, pp:721--724, 2002.]]
[21]
M. J. Zaki: Efficiently Mining Frequent Trees in a Forest, Proc. KDD2002, pp:71--80, 2002.]]

Cited By

View all
  • (2024)Finite-element analysis case retrieval based on an ontology semantic treeArtificial Intelligence for Engineering Design, Analysis and Manufacturing10.1017/S089006042400004038Online publication date: 14-May-2024
  • (2010)Efficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured DataComputer and Information Sciences10.1007/978-90-481-9794-1_66(353-358)Online publication date: 18-Aug-2010
  • (2008)Efficient algorithms for mining frequent and closed patterns from semi-structured dataProceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1786574.1786578(2-13)Online publication date: 20-May-2008
  • Show More Cited By

Index Terms

  1. Key semantics extraction by dependency tree mining

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
    August 2005
    844 pages
    ISBN:159593135X
    DOI:10.1145/1081870
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. phrase/sentence reconstruction
    2. redundancy reduction
    3. syntactic dependency
    4. text mining
    5. tree enumeration

    Qualifiers

    • Article

    Conference

    KDD05

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Finite-element analysis case retrieval based on an ontology semantic treeArtificial Intelligence for Engineering Design, Analysis and Manufacturing10.1017/S089006042400004038Online publication date: 14-May-2024
    • (2010)Efficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured DataComputer and Information Sciences10.1007/978-90-481-9794-1_66(353-358)Online publication date: 18-Aug-2010
    • (2008)Efficient algorithms for mining frequent and closed patterns from semi-structured dataProceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1786574.1786578(2-13)Online publication date: 20-May-2008
    • (2008)Efficient Algorithms for Mining Frequent and Closed Patterns from Semi-structured DataAdvances in Knowledge Discovery and Data Mining10.1007/978-3-540-68125-0_2(2-13)Online publication date: 2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media